Job Description: SRE Engineering Manager - DevOps & Reliability
Overview: We are seeking a highly skilled and experienced SRE Engineering Manager to lead our Site Reliability Engineering (SRE) and DevOps teams. This leader will play a crucial role in overseeing the software development lifecycle, driving automation, reliability, and performance improvements, while ensuring alignment with company objectives.
This position requires a combination of technical expertise, leadership, strategic thinking, and the ability to manage a team of engineers. As a key enabler of business success, the SRE Engineering Manager will be instrumental in ensuring the reliability and efficiency of our systems, applications, and infrastructure.
Responsibilities:
- Leadership & Team Development
- Oversee the software development lifecycle from design to deployment, ensuring on-time delivery and high-quality software.
- Manage, mentor, and develop a high-performing team of engineers, ensuring they are continuously growing and able to meet business needs.
- Strategic Leadership & Execution
- Define and track key metrics such as MTTR, deployment frequency, error rates, and change failure rates to assess the success of SRE and DevOps initiatives.
- Take full ownership of product development execution across R&D, Solutions Engineering, DevOps, and reliability, ensuring timely and efficient delivery.
- Process & Standards Improvement
- Promote and implement agile methodologies and engineering best practices (e.g., CI/CD, automated testing, and DevSecOps).
- Standardize DevOps tools, practices, and governance across the organization, working closely with DevOps architects and engineers.
- Collaboration & Stakeholder Engagement
- Act as the bridge between engineering teams, product management, security, and platform teams to ensure seamless communication and collaboration.
- Advocate for the adoption of SRE-driven toolchains and reliability practices across the organization
5.Multitask Across Multiple Verticals and Platforms:
Manage the operations and reliability of services across multiple business verticals and platforms.
Technical Expertise:
- Strong background in DevOps and SRE, with hands-on experience in CI/CD pipelines, Kubernetes, Terraform, AWS/GCP/Azure, and observability tools.
- In-depth understanding of Infrastructure-as-Code (IaC) tools such as Terraform, Ansible, Helm, and GitOps.
- Expertise in reliability engineering practices, including SLIs, SLOs, error budgets, and incident management.
- Solid experience with DevSecOps, ensuring security and compliance in DevOps workflows.
- Proficient in monitoring and observability tools like Prometheus, Grafana, OpenTelemetry, and the ELK stack.
Why Join Us? As an SRE Engineering Manager, you will play a critical role in shaping the future of our engineering operations. You will work with a talented, motivated team, and be at the forefront of implementing cutting-edge technology solutions. We offer a dynamic, collaborative work environment with opportunities for growth, leadership, and impact.