SRE Engineer will play a critical role in ensuring our trading services are always available, scalable, and
engineered to withstand unparalleled demand.
· You will be deeply involved in incident management, troubleshooting, and root cause analysis, with a strong
emphasis on automation and improving our operational processes.
· Your expertise in DevOps practices, coupled with your strong skills in Dynatrace, Splunk, and Grafana, will be
essential in monitoring, visualizing, and troubleshooting our systems.
Key Responsibilities:
· Incident Management: Lead and manage incident response and blameless post-mortems, ensuring quick
recovery and future prevention.
· Monitoring and Observability: Utilize Dynatrace, Splunk, and Grafana to implement comprehensive monitoring
and observability frameworks for proactive incident detection and performance metrics visualization.
· DevOps Integration: Collaborate with development teams to integrate DevOps practices into the software
development lifecycle, enhancing CI/CD pipelines for better reliability and efficiency.
· Root Cause Analysis: Conduct in-depth root cause analysis for incidents and outages, developing long-term
solutions to prevent recurrence.
· Performance Tuning: Optimize system performance by identifying bottlenecks and implementing scalable
solutions.
· Automation: Develop automation tools and scripts to reduce manual intervention, improve system reliability,
and streamline operational processes.
· Documentation: Create and maintain detailed documentation for system architecture, incident reports, and
operational procedures.
· Collaboration and Leadership: Work closely with cross-functional teams to share knowledge, mentor junior
team members, and promote a culture of reliability and continuous improvement.
SRE Exposure (5-15 Yrs), DevOps, Exposure to high vol. transanction mgnt with APM tool Dynatrace, Grafana, Splunk & Grafana