- Hi,
- Hope you are doing well.
- Please go through the below Job Description and I would humbly request you to apply with your latest resume. Our goal is not to just submit your profile but to secure an interview for you. If this email/Job Description has reached out to you in error, please accept my sincere apologies and disregard the email. Notify us via email, and we'd immediately remove you from consideration.
- Position: Capacity Engineer with Data Engineer
- Location: Remote
- Employment type: Full Time
- 10-15 years of experience
- Please search in below order
- Tools & Techniques
- Data Engineering Stack: SQL, Python, Spark, Airflow for data processing and orchestration. - 3-4 years [we can go little lower also fine]
- Monitoring & Observability: Prometheus, Grafana, Datadog.
- Chaos Engineering: Test system resilience under stress.
- Infrastructure as Code: Terraform, Ansible, Harness.
- Data Engineer with strong Site Reliability Engineering (SRE) expertise in capacity planning. This role ensures our infrastructure scales efficiently to meet user demand, balancing performance with cost. The engineer will forecast growth, analyze usage trends, and automate resource provisioning to prevent outages, over-provisioning, or under-provisioning. In addition, the role requires building robust data pipelines and analytical models to support forecasting and decision-making.
- Key Responsibilities
- · Data Pipeline Development: Design and maintain ETL/ELT pipelines to collect, transform, and store infrastructure usage data.
- · Data Modeling: Build models to analyze system metrics and predict future resource needs.
- · Demand Forecasting: Analyze historical usage patterns to predict CPU, memory, and storage requirements.
- · Load Testing & Scaling: Simulate traffic spikes to identify bottlenecks and ensure systems scale linearly.
- · Cost Efficiency: Optimize resource allocation to avoid unnecessary costs while maintaining service availability.
- · Automation: Use Infrastructure as Code (IaC) tools like Terraform to automate scaling and provisioning.
- · Architecture Review: Collaborate with software teams to flag single points of failure and ensure resilient service design.
- Tools & Techniques
- · Monitoring & Observability: Prometheus, Grafana, Datadog.
- · Chaos Engineering: Test system resilience under stress.
- · Infrastructure as Code: Terraform, Ansible, Harness.
- · Data Engineering Stack: SQL, Python, Spark, Airflow for data processing and orchestration.
Qualifications
- · Strong background in data engineering and SRE practices.
- . 10-15 years of experience
- · Hands-on experience with capacity planning, forecasting, and scaling.
- · Proficiency in IaC tools (Terraform, Ansible, Harness).
- · Experience with data pipelines, ETL/ELT frameworks, and big data tools.
- · Familiarity with monitoring/observability platforms (Prometheus, Grafana, Datadog).
- · Knowledge of chaos engineering and resilience testing.
- · Excellent collaboration and communication skills.
Mention you found this on Data First Jobs — it helps us bring you more roles like this.
Capacity Engineer with Data Engineer
IMCS Group
Similar Engineering Jobs
View all Engineering jobs→Ubisoft
Data Developper TG Qualifty Engineering (TGQF)
New
Montreal, Quebec (Canada)
Credit One Bank
VP I, Systems Engineering - Network and Data Centers
New
Las Vegas, Nevada (USA)
Jobright.ai
Machine Learning Engineer - Early Career
New
USA
Precision Technologies
Gen AI / Machine Learning Engineer
New
USA
Skyfall AI
Research Engineer ML
New
Toronto, Ontario (Canada)
Confluence
Associate Data Engineer
New
Pittsburgh, Pennsylvania (USA)
Like this role? Get carefully selected jobs like it, twice a week, straight to your inbox.
Free, no spam. Unsubscribe anytime.