Plan, manage, and oversee all aspects of a production environment
Define strategies for application performance monitoring and optimisation in a production environment
Respond to incidents
Improvise platform based on feedback and measure the reduction of incidents over time
Support deployment of code into multiple lower environments
Support current processes with an emphasis on automating everything as soon as possible
Design, develop and standardise a monitoring and alerting mechanism for the supported applications
Take a holistic approach to problem solving, by connecting the dots during a production event through the various technology stack that makes up the platform, to optimising meantime to recover
Engage in and improve the whole lifecycle of services - from inception and design, through deployment, operation and refinement
Analyse ITSM activities of the platform and provide feedback loop to Development teams on operational gaps or resiliency concerns
Support services before they go live through activities such as system design consulting, capacity planning and launch reviews
Support the application CI/CD pipeline for promoting software into higher environments through validation and operational gating, and lead in DevOps automation and best practices
Maintain services once they are live by measuring and monitoring availability, latency and overall system health
Scale systems sustainably through mechanisms like automation and evolving systems by pushing for changes that improve reliability and velocity
Work with a global team spread across tech hubs in multiple geographies and time zones
Ability to share knowledge and explain processes and procedures to others
Share knowledge and mentor Junior resources
Ability to perform on-call duties on a rotational basis
Occasional off-hours work required
Skills Required
Must have:
Linux
Mainframe
Shell scripting
ITIL / ITSM
Application troubleshooting
SQL
Any monitoring tool (Splunk / Dynatrace preferred)
Jenkins - CI/CD
Groovy scripting / YAML (basic)
Git (basic) / Bitbucket (basic)
Good to have:
Ansible / Chef
Event framework architecture
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job