Kforce

Lead Site Reliability Engineer (AWS/Azure)

San Diego, CA, US

Hybrid
Full-time
8 days ago
Save Job

Summary

Kforce has a client that is seeking a Lead Site Reliability Engineer (AWS/Azure) in San Diego, CA. Overview: The Lead Site Reliability Engineer is responsible for driving the organizational reliability strategy and conducting resiliency design reviews to ensure the reliability, scalability, and performance of our company's software systems and applications meet organizational service level objectives (SLOs) and error budgets. The role is responsible for leading a team of Site Reliability Engineers in designing, implementing, and maintaining the infrastructure and tools necessary to support our platforms, as well as improving our monitoring, automation, and deployment processes. This role involves strategic planning, technical leadership, and collaboration with various stakeholders including Company's Product Delivery, Data Services, DevOps, DataOps, and Infrastructure teams to support organizational goals.* Bachelor's degree or 8+ years demonstrated work experience or an equivalent combination of related training and experience and at least three of those years spent in a leadership level role(s) required * Proven leadership experience and ability to manage a team * Experienced in cloud-based hosting solutions (AWS, Azure) * Experienced with Cloud server environments (AWS, Azure) * Experienced in Agile software development best practices utilizing Continuous Integration & Delivery Pipelines as well as agile tools such as Jira * Proven experience with large-scale software implementation (high transaction volume, high-availability concepts) * Deep knowledge of software deployment, versioning (GIT) and release management processes * Deep knowledge with infrastructure design, implementation, and support * Collaborate with stakeholders to define RPO/RTO for Company's system footprint * Expert in Cloud-based redundancy, high availability, and reliability strategies * Expert in reliability, scalability, and performance optimization * Expert at maintaining Linux/Unix, stronger preference and Windows systems administration, provisioning, configuration, monitoring, and troubleshooting Web Servers in a 7x24 customer facing environment * Strong Linux and Windows Administration & scripting * Solid Database Administration skills (MySQL, MariaDB, RDS, SQL Server, and Azure Storage services) * Deep knowledge of current methodologies in high performance operations and scalable multi-site implementations * Proficient at automated provisioning, automated configuration management, and containerization solutions and tools * Excellent written and verbal communication skills * Proficient in communicating to both technical and management levels * Highly adaptable * Ability to create DR strategies and execute DR drills * Ability to interact with external customers and staff members * Ability to work in a fast paced, constantly expanding environment

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job