Job Title:
ML Platform Engineer - GPU Infrastructure
- Job Summary
- Support team by designing, implementing, and maintaining the automation and ML workload enablement layer of the GPU cluster platform. This role focuses on optimizing GPU compute environments for AI/ML training and Isaac Sim simulation workloads, integrating GPU jobs into CI/CD pipelines, standardizing runtime environments, and supporting reliable storage and artifact management.
Experience
- 3+ years of experience in ML Platform Engineering, DevOps, Infrastructure Engineering, or related field
- Bachelor's or Master's degree in Systems Engineering, Computer Science, Computer Engineering, or related discipline
Responsibilities
- Support GPU cluster platforms for AI/ML and simulation workloads
- Optimize GPU compute environments for ML training and Isaac Sim execution
- Integrate GPU workload execution into CI/CD pipelines
- Standardize runtime environments using containers and automation tools
- Manage storage, artifacts, and workload outputs
- Troubleshoot and improve platform reliability, scalability, and performance
- Collaborate with ML, infrastructure, and engineering teams
- Required Skills
- Experience with Linux, Kubernetes, Docker, and GPU infrastructure
- Knowledge of CI/CD tools and automation scripting (Python/Bash)
- Experience supporting AI/ML workloads and distributed systems
- Familiarity with NVIDIA GPU technologies and containerized environments
- Strong troubleshooting and performance optimization skills
- Preferred Skills
- Experience with Isaac Sim or simulation workloads
- Exposure to cloud platforms (AWS, Azure, or GCP)
- Knowledge of monitoring and observability tools such as Grafana or Prometheus
Mention you found this on Data First Jobs — it helps us bring you more roles like this.
ML Platform Engineer - GPU Infrastructure
Optimal Staffing
Similar Engineering Jobs
View all Engineering jobs→Autodesk
Senior Director, Data Architecture and Engineering
New
Toronto, Ontario (Canada)
Envision
Data Engineer
New
St. Louis, Missouri (USA)
Applicantz
Senior Data Engineer
New
Toronto, Ontario (Canada)$60,000 - $70,000
ACL Digital
Engineering Support Maintenance Analyst II (Aircraft Engine & BOM Management)
New
Seattle, Washington (USA)$44,000 - $45,000
Applicantz
Analytics Engineer – Marketing Data Warehouse
New
USA
RemoteHunter
Staff Analytics Engineer, Subledger Platform
New
USA
Like this role? Get carefully selected jobs like it, twice a week, straight to your inbox.
Free, no spam. Unsubscribe anytime.