Interesting Job Opportunity: Sandhata - Site Reliability Engineer - Elastic Kubernetes Service
Chennai, TN, IN
2 months ago
Save Job
Summary
Job Title : SRE Engineer!
Position Overview
We are seeking an experienced Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a pivotal role in ensuring the reliability, availability, and performance of our cloud-based infrastructure hosted on AWS with EKS. You will work closely with cross-functional teams to implement best practices for monitoring, automation, and continuous integration and deployment using tools such as Datadog and Azure DevOps. The ideal candidate should have a solid background in cloud technologies, troubleshooting, and production release support.
Responsibilities
Collaborate with development and operations teams to design, implement, and manage scalable and reliable infrastructure solutions on AWS using EKS (Elastic Kubernetes Service).
Develop, maintain, and enhance monitoring and alerting systems using Datadog to proactively identify and address potential issues, ensuring optimal system performance.
Participate in the design and implementation of CI/CD pipelines using Azure DevOps, enabling automated and reliable software delivery.
Lead efforts in incident response and troubleshooting to quickly diagnose and resolve production incidents, minimizing downtime and impact on users.
Take ownership of reliability initiatives by identifying areas for improvement, conducting root cause analysis, and implementing solutions to prevent recurrence of incidents.
Work with the development teams to define and establish Service Level Objectives (SLOs) and Service Level
Indicators (SLIs) to measure and maintain the system's reliability.
Contribute to the documentation of processes, procedures, and best practices to enhance knowledge sharing within the team.
Qualifications
Minimum of 4 years of experience in a Site Reliability Engineer or similar role, managing cloud-based infrastructure on AWS with EKS.
Strong expertise in AWS services, especially EKS, including cluster provisioning, scaling, and management.
Proficiency in using monitoring and observability tools, with hands-on experience in Datadog or similar tools for tracking system performance and generating meaningful alerts.
Experience in implementing CI/CD pipelines using Azure DevOps or similar tools to automate software deployment and testing.
Solid understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes) and their role in modern application architectures.
Excellent troubleshooting skills and the ability to analyze complex issues, determine root causes, and implement effective solutions.
Strong scripting and automation skills (Python, Bash, etc.).
(ref:hirist.tech)
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job