- Role Overview & Key Responsibilities
- Data Pipeline Operations & On-Call : Own on-call rotation for ingestion pipelines (Kafka, AWS Glue); triage and resolve pipeline failures, schema mismatches, and throughput degradation; author RCAs.
- Data Quality Monitoring : Implement and maintain data quality checks across Bronze->Silver->Gold lakehouse layers (S3->Kafka->Snowflake/Redshift); alert on anomalies, missing data, or drift.
- ML Model Health & MLOps : Monitor deployed models for accuracy degradation, data drift, and concept drift; manage model redeployment workflows; maintain ML experiment tracking.
- AI Platform Reliability (Bedrock + LangChain) : Monitor AWS Bedrock inference latency, token usage, error rates, and cost; operate LangChain agent pipelines; use Langfuse for Al evaluation and observability.
- DORA Metrics - Data & AI Lens : Track deployment and release health for data pipeline and model updates; measure lead time for data model changes; monitor pipeline reliability as a DORA proxy.
- Schema & Contract Management : Monitor AWS Glue Schema Registry for schema evolution events; validate Avro contract compliance for new producer payloads; coordinate schema changes with module teams.
- Snowflake / Redshift Operations : Manage query performance, warehouse sizing, cost controls, and data retention policies; monitor Gold-layer data freshness and SLA compliance.
- Incident Escalation : Serve as first-line triage for all data and Al incidents; escalate to core data/ML engineers only when root cause requires architectural changes or new feature work.
- Required Skills & Experience
- Data Engineering (Strong)
- 4+ years of data engineering experience with production-grade pipelines
- Proficient with Apache Kafka: consumer groups, topic management, lag monitoring, DLQ handling
- Experience with AWS Glue, AWS Glue Schema Registry, and Avro/Parquet data formats
- Hands-on with Snowflake or Redshift: query optimization, cost management, RBAC
- Familiarity with lakehouse patterns: Bronze/Silver/Gold (S3-based) data architecture
- ML/AI Operations (Core Competency)
- Experience with MLOps practices: model versioning, drift detection, retraining pipelines Familiarity with AWS Bedrock, SageMaker, or equivalent managed ML inference platforms
- Working knowledge of LangChain or LlamaIndex for LLM application pipelines
- Experience with AI/LLM observability tools (Langfuse, LangSmith, or equivalent)
- Understanding of RAG (Retrieval-Augmented Generation) architectures and vector stores
- Operational Excellence (Core Competency)
- DORA metrics application to data and ML delivery pipelines
- On-call experience for data infrastructure; structured incident management and RCA
- Data quality framework implementation: Great Expectations, dbt tests, or custom checks
- Experience with monitoring and alerting for streaming pipelines (Kafka lag, throughput)
- Backend & AWS Exposure
- Python proficiency - scripting, pipeline development, data transformation
- AWS services: S3, Lambda, Glue, Bedrock, CloudWatch, SQS/SNS, IAM
- Familiarity with containerized workloads on Kubernetes (EKS)
- Experience with dbt or similar data transformation frameworks is a plus
- Nice to Have
- Exposure to ontology or knowledge graph systems (RDF, OWL, or property graphs)
- Familiarity with Temporal for workflow orchestration of ML pipelines
- Experience with multi-tenant data platforms and row-level security patterns
- Understanding of GDPR-compliant data handling and encryption key management
Mention you found this on Data First Jobs — it helps us bring you more roles like this.
AI/ML Data Engineer (with AWS)
Rivago Infotech Inc
Similar Engineering Jobs
View all Engineering jobs→Concurrent Technologies Corporation
Senior Data Science Engineer/Specialist
New
Johnstown, Pennsylvania (USA)
ESH Bilişim A.Ş.
Software Tester / QA Analyst / Automation Engineer
New
RemoteUSA
Cliff Services Inc
W2 - Data Engineer with Java - Reston, VA
New
Reston, Virginia (USA)$45,000 - $54,000
Rush Enterprises, Inc
Associate Data Analytics Engineer
New
New Braunfels, Texas (USA)
Lorven Technologies Inc.
AI ML Engineer with Telcom
New
Bellevue, Washington (USA)
Vishveshwarya Group of Institutions
Junior Data Engineer
New
RemoteUSA
Like this role? Get carefully selected jobs like it, twice a week, straight to your inbox.
Free, no spam. Unsubscribe anytime.