Must have Hands-on experience in:
Observability: Implementing end-to-end monitoring solutions, implementing SLOs and SLIs for customer journeys, using industry tools like Datadog, Dynatrace, AppDynamics, etc.
DevSecOps: Setting up CD pipelines using tools
Cloud Technologies: One of the major cloud technologies - AWS, GCP, or Azure – for key services – Compute, Storage, and Networking
Infrastructure as Code: Solution design and implementation with industry tools like Terraform, Ansible, etc.
Containerization: Docker, Kubernetes, Helm, etc.
Scripting and Automation: Scripting languages and automation tools
Preferred Skills:
Develop observability solution implementations – monitoring, anomaly detection, alerting, and self-healing using industry tools like Datadog, Dynatrace, AppDynamics, New Relic, etc
Support critical incident resolution in a complex environment – applications hosted on cloud or datacenters, containerized applications, databases, etc.
Set up SLOs and SLIs using industry-leading tools
Play the role of an individual contributor and lead a small team in a global delivery model.
Develop Proof of Concepts (PoCs) and perform hands-on technical tasks based on client needs.
Support responding to Requests for Proposal (RFPs) from clients
Analyze and identify improvement opportunities for automation and automate them.
Experience in Implementing AI/ML-based monitoring and self-healing solutions
Experience in Implementing Chaos Engineering/testing