Infosys

Site Reliability Engineer

Richardson, TX, US

2 months ago
Save Job

Summary

Must have Hands-on experience in:

Observability: Implementing end-to-end monitoring solutions, implementing SLOs and SLIs for customer journeys, using industry tools like Datadog, Dynatrace, AppDynamics, etc.

DevSecOps: Setting up CD pipelines using tools

Cloud Technologies: One of the major cloud technologies - AWS, GCP, or Azure – for key services – Compute, Storage, and Networking

Infrastructure as Code: Solution design and implementation with industry tools like Terraform, Ansible, etc.

Containerization: Docker, Kubernetes, Helm, etc.

Scripting and Automation: Scripting languages and automation tools

Preferred Skills:

Develop observability solution implementations – monitoring, anomaly detection, alerting, and self-healing using industry tools like Datadog, Dynatrace, AppDynamics, New Relic, etc

Support critical incident resolution in a complex environment – applications hosted on cloud or datacenters, containerized applications, databases, etc.

Set up SLOs and SLIs using industry-leading tools

Play the role of an individual contributor and lead a small team in a global delivery model.

Develop Proof of Concepts (PoCs) and perform hands-on technical tasks based on client needs.

Support responding to Requests for Proposal (RFPs) from clients

Analyze and identify improvement opportunities for automation and automate them.

Experience in Implementing AI/ML-based monitoring and self-healing solutions

Experience in Implementing Chaos Engineering/testing

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: