Required Skills & Experience
• 10+ years overall experience in application engineering
• 7+ years of SRE experience (architect or engineer) with SRE/Observability toolsets like Dynatrace/ AppDynamics/ New Relic, Splunk/Elastic
• 3+ years' experience monitoring applications using various SDLC methodologies preferably Agile
• 3+ years of technology design expertise which includes Performance, Security, Availability, as well as Operations, Monitoring and Support
• 2+ years of experience in Relational database management skills like MSSQL, MySQL, SQL, PostgreSQL or MongoDB
• 2+ years of experience in any of the scripting languages like Unix Shell Scripting, Python, or PowerShell
• 2+ years of experience in technology design expertise which includes Containerization, Performance, Security, Availability, Operations, Monitoring, and Support
• Experience in Systems Architecture, in-depth knowledge on SRE, IT Operations, Cloud, Coding and Scripting experience with Java, JavaScript, python and .NET, understanding of AI/ML
• Experience in a regulated industry; financial services experience ideal
• Bachelor's degree in MIS, computer science, math, or other science field required, advanced degree in a related field
Job Description
Insight Global is looking for a Site Reliability Engineer to join their clients team.
• Design, configure and sets up observability platform tools (Splunk and Dynatrace), both on-premises and cloud, to guide application development efficiencies and improve operational stability of the applications
• Work with Observability Manager and Architect to develop Monitoring capabilities strategy and Roadmaps and accomplish agreed upon priorities
• Develop tooling and processes to increase automation of monitoring and adherence to security and audit systems and controls
• Integrate and configure additional tools/frameworks to support and enable automation of various monitoring activities across the enterprise
• Perform analytics on incidents and usage patterns to better predict issues and take proactive actions
• Collaborate across the departments to gauge the effectiveness and efficiency of existing systems
• Foster the adoption of Observability tools and capabilities across Technology groups
• Partner with Service owners to implement Service Level Metrics & Service Level Objectives that act as service level health indicators
• Measure, communicate and deliver on enterprise platforms stability, scalability and technology organizations maturity in DevOps
• Resolve issues, alerts, and incidents based on predefined service level agreements regarding system availability, performance, and service levels
• Analyze the monitoring requirements in close collaboration with the architect and translate them into tasks for engineers to develop.
• Deliver presentations to managers and other technology and business partners
• Be a mentor to engineers, providing assistance, guidance and training