Signify Technology

Lead Site Reliability Engineer

London, England, GB

10 days ago
Save Job

Summary

This role plays a key part in the global follow-the-sun support model, working closely with the Global SRE Leader to support platforms worldwide. We are looking for SRE talent with experience in an On-Prem / Datacenter environment.


The ideal candidate will bring strong technical leadership, experience in an On-Prem / Datacenter environment, and a passion for operational excellence to a high-impact team. You'll collaborate with Engineering, Infrastructure, and Operations teams to maintain high availability and resilient service delivery, while also mentoring a SRE team focused on continuous improvement and innovation.


Key Responsibilities:


Technical Leadership

  • Develop deep expertise in the Titanium trading platform to lead and support critical business operations.
  • Oversee team workload, ensuring priorities align with business goals and resource capacity.


Operational Excellence

  • Champion initiatives that enhance system availability, scalability, and performance.
  • Collaborate with the Global SRE Leader to refine and enforce operational policies (e.g., Capacity Planning, Change Management, Disaster Recovery).


Cross-Functional Collaboration

  • Partner with Software Engineering, Infrastructure, Operations, Security, and Business teams to deliver secure and reliable platforms.


Team Development

  • Build, lead, and mentor a high-performing SRE team in Europe, fostering a culture of ownership, collaboration, and innovation.


Incident Response & Postmortems

  • Lead response efforts for critical incidents, ensuring swift resolution and comprehensive root cause analysis.
  • Drive long-term improvements based on lessons learned from Learning Reviews, and maintain accurate incident documentation and compliance reporting.


Automation & Efficiency

  • Lead automation initiatives to streamline workflows and increase uptime.
  • Use Jira to manage tasks and projects, and align global SRE practices for seamless support.


Capacity Planning

  • Drive timely capacity planning to prevent last-minute issues.
  • Support budget planning to align infrastructure investments with growth and performance targets.
  • Participate in quarterly capacity reviews and follow up on outcomes.


Monitoring & Analytics

  • Oversee the implementation of monitoring and alerting systems to detect and resolve issues proactively—before customer or compliance impacts occur.


Qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred)
  • 5+ years in a technical SRE, DevOps Position
  • 2+ years in a leadership or senior engineering capacity


Preferred Skills:

  • Strong Python programming skills
  • Proficiency in SQL and data analytics tools (e.g., Sigma, Snowflake)
  • Experience in AWS, monitoring tools (Datadog, Prometheus, Grafana), and automation frameworks (Terraform, Ansible, Pulumi)


For more information, please apply with a relevant CV.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: