Title: Data Bricks Migration and Support engineer
Mutliple Locations: Seattle, WA / Dallas- Plano TX/ St Louis MO.

Must Have Technical/Functional Skills
• Successfully executed a data migration or modernization to Data Bricks, preferably IBM Data Stage to Data Bricks on AWS
• Should have Experience in handling Large Migrations to Data Bricks.
• Should have good analytical skills to compare the legacy and modern data platform end to end right from source to target.
• Good understanding of DataBricks implementation of Medallion layer architecture.
• Independently Lead and Managed large Data Bricks migrations.
• CI/CD Integration: Implement version control (e.g., Git) and automated deployment processes for Databricks assets

Technical and architectural skills required are below.
Core Data Engineering Languages
• Experience in Advanced SQL for building modular analytics workflows, utilizing advanced Common Table Expressions (CTEs), and writing high-performance queries inside Data Bricks SQL Analytics.
• Experience in Python or Scala to build, optimize, and debug complex data transformation scripts, custom functions, and machine learning pipelines.
Big Data & Architecture Core
• Experience in Apache Spark Ecosystem for understanding cluster execution flow, memory allocation, driver/worker nodes, and handling data frames.
• Experience in Delta Lake Architecture to understand ACID transactions on object storage, data skipping, partition strategies, and automated data compaction.
Databricks Platform Expertise
• Experience in Delta Live Tables (DLT) & Workflows for constructing and orchestrating production-ready, declarative streaming, and batch ETL pipelines.
• Experience in Unity Catalog for setting up data governance, column/row-level access control, and tracking end-to-end data lineage across workspaces.
• Experience in Auto Loader for implementing modern, incremental data ingestion patterns from cloud blob storage into the lakehouse.
Code Translation & Refactoring
• Pipeline Conversion: Translate visual DataStage Parallel Jobs and Sequences into Python/PySpark scripts or Data bricks Notebooks
• Legacy Refactoring: Modernize legacy logic rather than applying "lift and shift" anti-patterns; adapt workflows to think in distributed DataFrames rather than DataStage stages.
• Logic Mapping: Map DataStage components—such as Aggregators, Joiners, Transformers, and Sort stages—to equivalent Spark operations
Testing & Reconciliation
• Validation & Reconciliation: Build automated reconciliation frameworks to compare row counts, checksums, and aggregate sums between legacy DataStage outputs and new Databricks output
• Data Cleansing: Identify and resolve data type discrepancies, null-handling differences, and encoding issues during the extraction and loading phases
Platform Orc hestration & Governance
• Orchestration: Replace DataStage sequence jobs with Databricks workflows ( or external orchestrators like Azure Data Factory/Airflow) to schedule and manage dependencies
• Data Governance: Enforce data lineage, security, and cataloging using Unity Catalog to ensure compliance in the new Lakehouse environment.

GOOD TO Cloud Infrastructure & CI/CD
• Cloud Providers (AWS): Understanding underlying cloud object storage , identity access management (IAM), and network security configurations.
• DevOps & Bundles: Familiarity with Databricks Asset Bundles (DABs) and CI/CD tools to automate the deployment of workspaces and pipeline assets.
Legacy Assessment & Migration Mechanics
• Code Conversion & Translation: The ability to parse legacy code structures and refactor them into Databricks-native code.
AI-Assisted Migration: Skills in using AI coding assistants and open framework agent tools to analyze application interdependencies, automate schema mapping, and accelerate lift-and-shift workloads
• Code Conversion & Translation: The ability to parse legacy code structures from ETL pipelines, Informatica, data Stage preferred
Experience working in Agile teams and understanding of data governance frameworks.

Responsibilities

Support post-migration environment from IBM DataStage to Databricks
Incident & Lifecycle Management
• CI/CD Deployment: Support code deployments across Development, Test, and Production environments using Databricks Repos and REST APIs
• Monitoring & Alerting: Set up monitoring via Databricks System Tables and observability tools to catch job failures, data anomalies, or latency spikes early
Pipeline Maintenance & Orchestration
• Workflow Management: Transition from DataStage job sequences to native data bricks workflows for scheduling, dependency tracking, and alerts
• ETL Refactoring: Troubleshoot and fix issues in generated PySpark or Spark SQL code that replaced legacy DataStage Transformer or Lookup stages
• Streaming & Batch Integration: Support ongoing data ingestion using data bricks autoloader to process files continuously from cloud storage
Performance Tuning & Cost Optimization
• Compute Management: Monitor and configure serverless or classic clusters to prevent over-provisioning
• Query Optimization: Analyze Spark execution plans. Replace inefficient row-by-row processing logic (a common DataStage carryover) with vectorized operations and native Spark functions
• Storage Optimization: Maintain Delta Lake tables by enforcing layout optimization (\(ZORDER\)
Data Governance & Security
• Access Control: Implement granular permissions, column-masking, and row-level filters using Data bricks unity catalog to replace DataStage's legacy security policies
• Data Quality: Utilize Delta Live Tables (DLT) to build pipelines with built-in, declarative data quality expectations and monitoring
Additional Skills
• Excellent communication Skills
• Ability to collaborate with Legacy and Modernize application teams and stake holders

Data Bricks Migration and Support engineer

Responsibilities

Similar Engineering Jobs

Lead Data Engineer

Senior Data Engineer

Senior Mechanical Engineer - Data Centers

Data Modeling Engineer

Machine Learning Engineer, Data Mining

Engineering Process and Work Management Analyst