Overall Objectives of Job: (If multiple sections, accord weightage to each section)
Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
Deep understanding of cloud platforms such as AWS, Azure, or Google Cloud.
Strong knowledge of network design, TCP/IP, DNS, routing, and network security best practices.
Expertise in monitoring tools (Prometheus, ELK).
Hands-on experience with automation tools (Terraform, Ansible, Jenkins, CI/CD).
Proficiency with containerization and orchestration (Docker, Kubernetes).
Proficiency in scripting languages (Bash, Python, Go).
Familiarity with microservices architecture and distributed systems.
100%
PART 3
Responsibilities
Duties and Responsibilities
List in order of importance and state approximate weightage accorded to each.
Work closely with developers, QA, and operations teams to foster a DevOps culture focused on security, reliability, and automation.
Monitoring & Alerting:
- Design, implement, and manage comprehensive monitoring solutions using tools like Prometheus, Grafana, ELK stack, etc.
- Develop and maintain alerting systems that proactively provide insights into system health and performance.
- Define and track SLIs, SLOs, and SLAs for critical services and ensure continuous compliance.
Automation & Infrastructure Management:
- Automate infrastructure provisioning and management using tools such as Ansible or Terraform eliminate manual interventions.
- Build and maintain CI/CD pipelines ( GitLab CI) to streamline deployments and ensure system consistency.
- Implement automated testing and validation processes for infrastructure and applications.
30
Orchestration & Infrastructure as Code:
- Leverage containerization and orchestration technologies (Docker, Kubernetes) to manage scalable, resilient, and fault-tolerant services.
- Use Infrastructure as Code (IaC) to automate and standardize environment provisioning and configuration management.
20
Networking & Security:
- Ensure the security and compliance of infrastructure by implementing best practices in network security, including encryption, firewall management, access controls, and intrusion detection.
- Perform regular security audits and vulnerability assessments to identify and mitigate risks.
- Monitor network traffic and optimize performance through network tuning and troubleshooting.
20
Reliability Engineering:
- Develop high-availability and disaster recovery solutions for mission-critical services.
- Conduct postmortems for major incidents, perform root cause analysis, and implement preventive measures.
- Collaborate with development teams to optimize applications for performance and security.
- Continuously improve operational processes by identifying bottlenecks, automating workflows, and enhancing security measures.
30
PART 4
Skills
Qualification, Experience, Technical and Functional Skills
- Candidate with below experience
Candidate with 10+ years of experience.
- Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
- Deep understanding of cloud platforms such as AWS, Azure, or Google Cloud.
- Strong knowledge of network design, TCP/IP, DNS, routing, and network security best practices.
- Expertise in monitoring tools (Prometheus, ELK).
- Hands-on experience with automation tools (Terraform, Ansible, Jenkins, CI/CD).
- Proficiency with containerization and orchestration (Docker, Kubernetes).
- Proficiency in scripting languages (Bash, Python, Go).
- Familiarity with microservices architecture and distributed systems.
- Bachelor in Engineering/MCA/M.sc/ M.S./ MBA in Systems, IT or Insurance or Finance.
Soft Skills
- Excellent verbal & non verbal communication skills
- Should be a team player.
- Good analytical and problem-solving skills.
- Leadership skills
PART 5
Key Competencies
- Proven experience in an SRE, DevOps, or infrastructure engineering role with a focus on monitoring, automation, and orchestration.
- Deep understanding of cloud platforms such as AWS, Azure, or Google Cloud.
- Strong knowledge of network design, TCP/IP, DNS, routing, and network security best practices.
- Expertise in monitoring tools (Prometheus, ELK).
- Hands-on experience with automation tools (Terraform, Ansible, Jenkins, CI/CD).
- Proficiency with containerization and orchestration (Docker, Kubernetes).
- Proficiency in scripting languages (Bash, Python, Go).
- Familiarity with microservices architecture and distributed systems.
64922 | IT & Tech Engineering | Professional | Allianz Technology | Full-Time | Permanent Warning: When posting this job advertisment on an external job board, the length of the following fields combined must not exceed 3950 characters: "External Posting Description", "External Posting Footer"