ARMITAGE ASSOCIATES LIMITED

DevOps Manager

Toronto, ON, CA

9 days ago
Save Job

Summary

DevOps, Manager


Our Client - one of Canada's fintech leaders is seeking an DevOps, Lead who We are seeking an experienced Technical Lead to take ownership of our application operationalization efforts. This role is for a seasoned professional with a deep understanding of cloud platforms, automation, and Infrastructure-as-Code (IaC) within an Agile environment. You will lead the charge in optimizing and scaling applications, driving operational excellence, and ensuring secure, reliable, and efficient application delivery.


The Manager will be responsible for:

  • Leading the operationalization and stability of an e-commerce application, ensuring seamless deployment, monitoring, and management. Implement and oversee processes that enhance the reliability, scalability, and performance of the application.
  • Providing strategic technical direction to the AppOps team and developers, Mentor and guide team members to foster a culture of continuous improvement, collaboration, and innovation.
  • Leading optimization initiatives by leveraging comprehensive monitoring, diagnostic tools, and analytics. Use data-driven insights to proactively address potential issues.
  • Developing, managing, and provisioning cloud infrastructure using IaC tools (Terraform, Ansible, Chef, etc.) to support application scalability and automation.
  • Managing and optimizing the use of cloud-native tools, including server-less architectures, microservices, and managed services, to support scalable application operations.
  • Leading the development and implementation of automation solutions that reduce manual interventions, such as self-healing and auto-scaling systems, streamlining cloud infrastructure and application management processes.
  • Collaborating with development, operations, and security teams to embed security best practices into all aspects of the application lifecycle, including CI/CD pipelines and IaC.
  • Leading the management of application and infrastructure certificates, ensuring they are kept up to date and secure. Oversee vulnerability management processes, including scanning, assessment, prioritization, and remediation to maintain a robust security posture.
  • Working closely with developers to implement deployment strategies that ensure little to no application downtime, including canary deployments, blue-green deployments, and rolling updates. Ensure robust rollback and failover procedures are in place.
  • Leading the design and implementation of automated post-implementation verification (PIV) processes to ensure the stability and functionality of applications after each release. Collaborate with QA and development teams to automate end-to-end testing and validation.
  • Instilling a sense of ownership for applications, features, and services across the team, driving accountability throughout the software development lifecycle.
  • Establishing and tracking key application (KPIs, SLI, SLO, SLA) and Customer Experience metrics to ensure operational excellence and inform continuous improvement efforts.
  • Working alongside technical document writers to create and maintain comprehensive documentation for systems, automations, and operational processes. Ensure knowledge is shared across teams to build a resilient operational environment.
  • Leading on-call support for critical application operations to ensure 24/7 availability and quick resolution of incidents.


Qualifications:

  • Bachelor's degree in Computer Science, Information Technology, or related field.
  • 5-8+ years of experience in AppOps, DevOps, or SRE roles, with at least 3-5+ years in a technical leadership capacity overseeing large-scale, cloud-based applications.
  • 5-8+ years of experience leading Agile and DevOps transformation within teams, driving continuous integration, continuous delivery (CI/CD), and Infrastructure-as-Code (IaC) practices.
  • 5-8+ years of expertise in driving cross-functional teams (developers, SRE, QA, security) to deliver highly available, scalable, and resilient applications.
  • 5-8+ years of experience in Infrastructure as Code (IaC) and automation technologies (Terraform, Ansible, Chef, Pulumi) and ability to guide teams in adopting best practices.
  • 5-8+ years of experience with SRE principles and practices, including SLAs, SLOs, and error budgets, to drive operational excellence and reliability.
  • Proven track record in designing and implementing monitoring, alerting, and observability frameworks using tools like Prometheus, Grafana, ELK, Dynatrace, and Splunk to ensure proactive issue detection and resolution.
  • Proficiency in programming languages (Java, Python, YAML, Go etc.)
  • Ability to mentor and develop engineering talent, fostering a culture of continuous improvement, automation, and end-to-end ownership of services.
  • Must be eligible to work for Interac Corp. in Canada in a Full Time Capacity.
  • Certifications in relevant cloud technologies and automation tools (e.g., Azure, AWS, Kubernetes, Terraform) are a plus.
  • Certifications in cloud technologies and automation tools are a plus.


Nice to have:

  • Microsoft Certified: Azure Administrator Associate (or similar across AWS, GCP, etc.)
  • Certified Kubernetes Administrator
  • Site Reliability Engineering (SRE) Certification
  • Terraform Associate Certification
  • ITIL Foundation

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: