We are seeking a Senior Data Engineer (open to Principal level) to lead the modernization and ownership of a critical data pipeline within a large-scale healthcare analytics environment. This role will focus on transitioning legacy SAS-based pipelines to modern Python/PySpark on Databricks, while driving engineering best practices and scalable data solutions.
This is a hands-on engineering role with a strong emphasis on development capabilities. Candidates with application development experience and exposure to AI/automation technologies will stand out.
Key Responsibilities
Lead the modernization of data pipelines from SAS to Python/PySpark on Databricks
Own and evolve a mission-critical HEDIS data pipeline used for performance measurement and reporting
Design, build, and optimize scalable data pipelines in a distributed environment
Collaborate with SMEs during an initial knowledge transfer period, with eventual full pipeline ownership
Develop, schedule, and automate end-to-end data workflows
Ensure data quality, reliability, and performance across large datasets
Partner with cross-functional teams and analytics vendors to deliver high-quality data outputs
Contribute to best practices in version control, CI/CD, and agile development workflows
Required Qualifications
Strong development/engineering background (core requirement)
Hands-on experience with Python (scripting and application development)
Expertise in building and managing data pipelines and ETL workflows
Experience processing large-scale datasets in distributed environments
Proficiency with Databricks (notebooks, workflows, cluster management)
Solid experience with AWS services including S3, Lambda, Glue, and EC2
Strong SQL skills for complex transformations and data extraction
Experience with pipeline orchestration and automation
Familiarity with version control systems (Git) in a collaborative environment
Experience managing work via issues, epics, and agile tooling
Preferred / Nice-to-Have
Experience with AI, machine learning, or automation frameworks
Exposure to healthcare data (e.g., HEDIS)
Background in transitioning legacy systems to modern data platforms
What are we Looking For (Priority Order)
Strong development engineering capabilities (must-have)
Application development experience, especially Python scripting
Expertise in AI or automation (highly desirable bonus)

Lead Data Engineer

Similar Engineering Jobs

Data Engineer II, ISF Central Tech Team

QA Tester / Quality Assurance Analyst Engineer

Senior Machine Learning Engineer

AI Data Infrastructure Engineer

LLM / Machine Learning Engineer

Project Engineer - Data Center