Senior DevOps Engineer [Role based in Abu Dhabi]

England, GB

2 months ago
Save Job

Summary

About the Role:

We are seeking a highly motivated and skilled DevOps/Site Reliability Engineer [Onsite roles based in Abu Dhabi, UAE] to join our team. The ideal candidate will have a passion for building, deploying, and maintaining scalable, reliable systems and infrastructure. You will work closely with development teams, ensuring smooth deployment pipelines, system stability, and operational efficiency.


Key Responsibilities:

Infrastructure Automation & Management

  1. Design, implement, and maintain CI/CD pipelines to streamline development workflows.
  2. Design and implement scalable infrastructure for AI model deployment and management
  3. Automate infrastructure provisioning and management using tools like Terraform, Ansible, or CloudFormation.
  4. Optimize cloud-based and on-premises resources to improve system scalability and cost efficiency.
  5. Manage and optimize queuing systems and real-time streaming architectures

System Reliability & Monitoring

  1. Monitor and troubleshoot production systems to maintain uptime and performance.
  2. Implement robust logging and alerting solutions using tools like Prometheus, Grafana, ELK stack, or similar.
  3. Implement comprehensive monitoring for both system metrics and ML model performance
  4. Conduct root cause analyses and post-mortem reviews to improve system reliability.

Collaboration & Support

  1. Work with development and QA teams to integrate new features into production environments seamlessly.
  2. Advocate for best practices in system architecture, security, and performance optimization.
  3. Provide on-call support for critical production systems as part of a rotation schedule.

Security & Compliance:

  1. Ensure infrastructure meets security and compliance requirements (e.g., SOC2, ISO27001).
  2. Manage secrets and credentials securely using tools like Vault or AWS Secrets Manager.


Required Qualifications:

  1. Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience).
  2. Strong proficiency in at least one scripting language (e.g., Python, Bash, or Go).
  3. Hands-on experience with cloud platforms like AWS, Azure, or Google Cloud.
  4. Proficiency with containerization and orchestration tools (Docker, Kubernetes).
  5. Experience with CI/CD tools such as AzureDevOps, Jenkins, GitLab CI/CD, or CircleCI.
  6. Knowledge of monitoring and observability tools (e.g., Prometheus, Datadog, or New Relic, Grafana, PagerDuty).
  7. Understanding of networking concepts (DNS, load balancing, firewalls).
  8. Understanding of streaming architectures for real-time AI applications

Preferred Qualifications:

  1. Experience with Infrastructure as Code (IaC) tools like Terraform or Pulumi.
  2. Knowledge of service mesh technologies (e.g., Istio, Linkerd).
  3. Familiarity with database administration and scaling (VectorDBs, SQL and NoSQL).
  4. Previous experience in a similar role in a high-traffic production environment.

Why Join Us?

  1. Opportunity to work on cutting-edge technology and challenging problems.
  2. Collaborative work environment that values innovation and growth.
  3. Competitive salary, benefits, and learning opportunities.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job