Insight Global

Site Reliability Engineer

Plano, TX, US

9 days ago

Save Job

Summary

Required Skills & Experience

• 10+ years overall experience in application engineering

• 7+ years of SRE experience (architect or engineer) with SRE/Observability toolsets like Dynatrace/ AppDynamics/ New Relic, Splunk/Elastic

• 3+ years' experience monitoring applications using various SDLC methodologies preferably Agile

• 3+ years of technology design expertise which includes Performance, Security, Availability, as well as Operations, Monitoring and Support

• 2+ years of experience in Relational database management skills like MSSQL, MySQL, SQL, PostgreSQL or MongoDB

• 2+ years of experience in any of the scripting languages like Unix Shell Scripting, Python, or PowerShell

• 2+ years of experience in technology design expertise which includes Containerization, Performance, Security, Availability, Operations, Monitoring, and Support

• Experience in Systems Architecture, in-depth knowledge on SRE, IT Operations, Cloud, Coding and Scripting experience with Java, JavaScript, python and .NET, understanding of AI/ML

• Experience in a regulated industry; financial services experience ideal

• Bachelor's degree in MIS, computer science, math, or other science field required, advanced degree in a related field

Job Description

Insight Global is looking for a Site Reliability Engineer to join their clients team.

• Design, configure and sets up observability platform tools (Splunk and Dynatrace), both on-premises and cloud, to guide application development efficiencies and improve operational stability of the applications

• Work with Observability Manager and Architect to develop Monitoring capabilities strategy and Roadmaps and accomplish agreed upon priorities

• Develop tooling and processes to increase automation of monitoring and adherence to security and audit systems and controls

• Integrate and configure additional tools/frameworks to support and enable automation of various monitoring activities across the enterprise

• Perform analytics on incidents and usage patterns to better predict issues and take proactive actions

• Collaborate across the departments to gauge the effectiveness and efficiency of existing systems

• Foster the adoption of Observability tools and capabilities across Technology groups

• Partner with Service owners to implement Service Level Metrics & Service Level Objectives that act as service level health indicators

• Measure, communicate and deliver on enterprise platforms stability, scalability and technology organizations maturity in DevOps

• Resolve issues, alerts, and incidents based on predefined service level agreements regarding system availability, performance, and service levels

• Analyze the monitoring requirements in close collaboration with the architect and translate them into tasks for engineers to develop.

• Deliver presentations to managers and other technology and business partners

• Be a mentor to engineers, providing assistance, guidance and training

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

Insight Global

Site Reliability Engineer

Plano, TX, US

Summary

How strong is your resume?

How strong is your resume?

People also searched:

Our Company

Career Guides

Career Advice

Support