Mobius by Gaian

Mobius - DevOps Engineer - Kubernetes/Docker

Hyderabad, Telangana, India

7 days ago
Save Job

Summary

About The Role

We are seeking an experienced DevOps Engineer to join our infrastructure team, with a strong focus on managing and optimizing GPU-based compute environments for machine learning and deep learning workloads. In this role, you will be responsible for the end-to-end infrastructure lifecyclefrom provisioning with Terraform/Ansible to deploying ML models using modern frameworks like Hugging Face and Responsibilities :

  • Manage infrastructure using Terraform and Ansible
  • Deploy and monitor Kubernetes clusters with GPU support (including NVIDIA drivers and H100 SXM integration)
  • Implement and manage inferencing frameworks such as Ollama, Hugging Face, etc.
  • Support containerization (Docker), logging (EFK), and monitoring (Prometheus/Grafana)
  • Handle GPU resource scheduling, isolation, and scaling for ML/DL workloads
  • Collaborate closely with developers, data scientists, and ML engineers to streamline deployments and Skill Set :
  • 58 years of hands-on experience in DevOps and infrastructure automation
  • Proven experience in managing GPU-based compute environments
  • Strong understanding of Docker, Kubernetes, and Linux internals
  • Familiarity with GPU server hardware and instance types
  • Proficient in scripting with Python and Bash
  • Good understanding of ML model deployment, inferencing workflows, and resource to Have :
  • Experience with AI/ML pipelines
  • Knowledge of cloud-native technologies (AWS/GCP/Azure) supporting GPU workloads
  • Exposure to model performance benchmarking and A/B testing

(ref:hirist.tech)

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job