Check similar jobs, what people also searched, or create a job alert for Senior Site Reliability Engineer jobs in Pleasanton, CA
XOPS
Senior Site Reliability Engineer
Pleasanton, CA
3 months ago
Save Job
Summary
The Senior Site Reliability Engineer (SRE) plays a vital role in ensuring the reliability, scalability, and performance of our enterprise software platform. This is a senior-level position that requires deep technical expertise, strong problem-solving skills, and the ability to collaborate effectively in a fast-paced, demanding environment. Our customers, the largest enterprises in the world, expect 24/7 platform availability and top-tier performance.
The ideal candidate has strong expertise in AWS cloud technologies, a deep understanding of serverless architectures (AWS Lambda), and a passion for building resilient systems to enhance the customer experience.
Platform Reliability:
Design, implement, and manage highly available and scalable systems to meet customer expectations for 24/7 uptime
Monitor, troubleshoot, and resolve platform incidents using tools such as Sentry, New Relic, and custom monitoring frameworks
Lead post-incident reviews to ensure root cause analysis and preventative measures are in place
Automation and Optimization:
Develop and maintain automation for infrastructure management, monitoring, and incident response
Optimize platform performance and scalability, proactively identifying and addressing bottlenecks
Contribute to the development of CI/CD pipelines to improve deployment reliability and speed
Collaboration:
Partner with L2 engineers to resolve complex customer issues, providing guidance and technical expertise as needed
Work closely with product engineering to ensure platform improvements align with customer needs
Actively contribute to the documentation and sharing of best practices to improve team performance and customer outcomes
Leadership:
Mentor junior engineers and provide technical leadership in reliability engineering
Drive cross-functional initiatives to improve platform stability and customer satisfaction
Requirements
Bachelor's degree in Computer Science or related discipline
8+ years in a Site Reliability Engineering or DevOps role, with experience supporting enterprise-grade software platforms
3+ years of experience in cloud services, in particular AWS
Experience building observability systems on New Relic, Cloudwatch or similar
Experience implementing rate-limiting, API gateways, and load balancing for highly available systems
Exposure to security best practices and compliance frameworks (e.g., SOC2, ISO27001)
Proficient in infrastructure as code (IaC) using tools such as Terraform or CloudFormation
Hands-on experience with scripting and programming languages like Python, Go, or Bash
Strong troubleshooting and debugging skills
Excellent communication and collaboration skills
Experience with incident management and post-mortem practices
Soft Skills:
Exceptional problem-solving and critical thinking abilities
Strong verbal and written communication skills, with the ability to navigate ambiguity and provide clarity
Ability to work collaboratively in cross-functional teams under pressure
Key Attributes:
Reliability-Driven: Strong commitment to platform reliability and performance
Leadership and Mentorship: Willingness to guide and mentor less experienced team members
Customer-Focused: Dedication to meeting and exceeding customer expectations in a high-pressure environment
Expectations:
Availability to participate in a 24/7 on-call rotation
Ability to work in a fast-paced, ambiguous environment with rapidly changing priorities
Proactive approach to identifying and mitigating risks before they impact customers
Strong sense of accountability and ownership for platform stability and customer satisfaction
Benefits
Opportunity to work on cutting-edge products and make a real impact
Collaborative and fast-paced work environment
Chance to be part of a rapidly growing startup
Competitive salary and benefits package (health insurance, dental insurance, vision insurance, paid time off, etc.)
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job