The Site Reliability Engineer, as a part of the Product Development team, is responsible to ensure the availability and performance of our application portfolio, toolset, and platform, as well as helping to drive improvements and ehancements at scale while automating processes linked with building and deploying to AWS Cloud.
Also the SR Engineer will take care of are operations linked to monitoring of SLA-critical production platforms, addressing issues and manual intervention, all off these in close cooperation with the software development teams.
With these activities you will have a great impact on our business:
Ensure environment stability, security and performance through SLO’s and CI/CD enforcement
Create and improve delivery- and stability- focused tooling across a range of languages and environments
Ongoing monitoring and control of the availability of the different services of the production 24/7
Monitor and detect problems in the production environment, as well as 1st and 2nd tier infrastructure and application troubleshooting
Participate in system design consulting, platform management, and capacity planning
With these skills you are a great candidate:
3+ years’ experience in a similar role
Coding experience with Python, Javascript or another equivalent
Hands-on experience with public cloud (AWS) and serverless architecture
Experience in using Infrastructure As Code (Terraform and/or CloudFormation)
Experience with monitoring tools and vendors such as Prometheus, Grafana, ELK, NewRelic, Signalfx, CloudWatch, DataDog, PagerDuty, etc;
#_VOIS
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job