Monitoring / Observability tools - Dynatrace , ELK etc.
Platform/ cloud Observability - OpenShift Prometheus / Azure Cloud etc.
Automation Skills - API integration, Scripting etc.
Key Responsibilities:
Experience in collaborating with various Infrastructure, Applications, platforms, and cloud teams on Observability solutions.
Experience implementing monitoring solutions using APM tools and Grafana for visualization - setup, configuration and developing monitoring /alerting solutions.
Experience managing Grafana platform with team-specific dashboards covering various KPIs & data sources , enable with alerts and establish SLOs.
Troubleshoot and resolve issues related to Observability solutions - Gaps, challenges and addressing solutions part of Production incidents.
Technical Expertise analyzing Infrastructure systems, services, and technologies towards monitoring, alerting and Incident response needs.
Experience working in apps, platforms and infra services on resilient infrastructure, scalable, and highly available environment.
Collaborate with App and services teams/SMEs to integrate monitoring solutions through Automation - APIs, webhooks, CI/CD deployments.
Document system configurations, standard operating procedures, and best practices.
Reflect on latest technologies and trends in Enterprise technologies, platforms, Automation and AI based solutions.