SENIOR SITE RELIABILITY ENGINEER
Reports to: Product Care Manager
Location: Toronto (hybrid)
Role Type: Full time (Permanent)
Level: Individual Contributor
Introduction
Kablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have assembled an amazing list of customers, including some of the best known enterprise and government organizations, in Australia and Canada. We’re looking to further accelerate our growth in both markets, and we’re seeking a Sr. Site Reliability Engineer to help us support new products to market.
Kablamo is proud to be an Advanced AWS Consulting partner, and we have recently been recognised as a global leader in designing and building cloud-based data and AI/ML solutions. At the 2021 AWS Global Public Sector conference, Kablamo won the award for “Most Innovative AI/ML Solution” for our work building bushfire prediction data platforms in Australia - we were selected from more than 1,800 AWS global partners.
The Role
As we expand the capability across our Product Care offering, we are looking for a Sr. Site Reliability Engineer (SRE) to help us build our capability and deliver insights from massive scale data in real time. The Sr. SRE role is responsible for developing automated solutions for operational aspects such as on-call monitoring, performance and capacity planning, and disaster response. The role will complement our ongoing development teams, looking at continuous delivery and infrastructure automation.
As the bridge between development and operations, you will be our primary escalation point across key customer accounts.
Key Responsibilities
- Contribute to the design, implementation, and maintenance of our AWS infrastructure
- Be proactive in anticipating production issues. Assess risks and mitigate against these, planning for contingencies and counter-measures in advance
- Ensuring reliability to get systems back to a steady state by quickly investigating and fixing performance, stability and scalability issues, ensuring Kablamo is able to meet SLA and SLO requirements
- Responsible for ensuring that the underlying infrastructure is running smoothly and that systems and tools are working as expected. You will be assessing risks and mitigating against these or planning appropriate contingencies and counter-measures in advance
- Develop or implement visual tools for technical and business teams to observe system health and supporting the Technical Account Manager in reporting on reliability metrics
- Use automation tools to solve problems, writing and developing code to automate processes, such as analysing logs and testing production environments
- Working with the engineering and/or development team to identify recurring problems which can be resolved through automation
- Responsible for enhancing performance, efficiency and monitoring of software development processes
- Act on system incidents; as the SRE you are a key contact involved in incident response and resolutions including active collaboration in any PIRs/Post-mortems
- Collaborate closely with product developers to ensure that the designed solution responds to non-functional requirements such as availability, performance, security, and maintainability. Actively collaborating with the development team to define fields for logging and tracing.
- Being a voice to advocate for reliability against competing priorities
- Helping prepare activities for production release, including facilitating training and enablement of client technical teams and/or attending appropriate meetings (Technical Working Groups, Architecture Review Boards, Change Advisory Boards)
Required Skills And Experience
- 5+ years’ experience in an SRE or DevOps role
- Deep understanding of system architecture and design principles
- Ability to think critically and problem solve, providing good performance under pressure
- Troubleshooting experience with the ability to clearly communicate to customers or the engineering team
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Experience with AWS and its services (Serverless, Deployment Tools, Networking, Containerization, Security, Cost Management)
- Familiarity with tools such as AWS CloudWatch, Datadog, Grafana, Prometheus, Scalyr, PagerDuty, OpsGenie, Jira Service Management
- Ability to work cross functionally with support engineering, development teams and/or client vendors to deliver sound outcomes and suggest system improvements
- Understanding of security requirements and implications and can conform to applicable security frameworks
- An in-depth knowledge of version control
- CI/CD implementation expertise
- Experience with production rollback
- Knowledge of fundamental network concepts and protocols
- The ability to program with one or more high level languages, such as Python, Go, Java, C/C++ and JavaScript
- A good understanding of DevOps concepts and best practices including Infrastructure-as-Code
Bonus Points For
- Bachelor’s degree in computer science or other similar technical qualification
- AWS Associate and/or Professional Level Certifications
- Strong grasp of networking, security, and reliability fundamentals
- Solid understanding of Agile methodologies and practices
Career Progression
- Lead SRE
- Principal/Staff SRE
Hiring Process
- 30-min intro chat with our TA team
- 2-hr Technical interview
- 1-hr Cultural interview
- 1-hr Final Interview
- References
- Offer!
Why Work at Kablamo?
Our Culture
We acknowledge a workplace that is diverse and inclusive, enables for greater innovation and produces benefits including improved performance, improved employee happiness and wellbeing, and superior outcomes for our customers. We attribute our success to all our unique and charismatic Kablamites. Through our fortnightly back to base and our debate Thunderdomes, we enable our Kablamites to provide feedback, share ideas, challenge the status quo and technically challenge each other constructively.
The PERKS!!!
- Kablamo bonus scheme
- Remote first with a downtown Toronto office available
- Work abroad for up to 3 weeks per year (some restrictions apply)
- Career growth (we really do promote from within!)
- Individual training budget
- Online rewards platform
- Regular social events
- Blogging rewards
- Paid birthday leave
- Anniversary bonus
- Referral bonus
- Parental Leave top up
- Employee Assistance Program
- Swag
Kablamo is a proud equal opportunity employer. We make our hiring decisions solely based on your skills and experience, as well as the perspectives and value you can bring to our team. Kablamo believes that diversity is vital to provide the best service to our clients and we are committed to fostering a varied and inclusive work environment. Every effort to accommodate candidates for accessibility will be made upon request. Information received related to accommodations will be addressed confidentially.
Kablamo would like to thank all candidates for their interest however only qualified applicants will be shortlisted.