Solarwinds Corp.

Staff Site Reliability Engineer

Austin, TX, US

Hybrid

Full-time

about 2 months ago

Save Job

Summary

Your Impact: We are seeking a Staff Site Reliability Engineer (Infrastructure & Site Reliability Engineering) with extensive experience in AWS, AZURE, Kubernetes, and GitOps to lead our Site Reliability Engineering (SRE) team. The successful candidate will deeply understand SRE practices and have a track record of implementing high-quality site reliability engineering practices (SLAs, SLOs, Proactive Alert Management, Incident Response/Review, Postmortems, etc.). In this role, you will work with our SRE and cross-functional engineering teams to develop and operate our development and production infrastructure and operations Your Role * Work collaboratively with software engineering on infrastructure and deployment requirements; * Contribute actively and assist in our automation and observability initiatives * Build and maintain operational tools for deployment, monitoring, and analysis of cloud (AWS & AZURE) infrastructure and systems * Collaborate with senior team members in responding to production incidents, actively contribute to postmortems, and engage in continuous improvement efforts as part of on-call rotations for exposure to critical issue resolution * Establish and drive operations performance through SLOs * Provide project management, sprint planning, and road-mapping support to the SRE team * Expert-level technical skills and ability to provide mentoring to team members * Our team uses practices to maximize our development velocity, including but not limited to: continuous integration/deployment, code review via GitHub pull requests Your Experience * Strong customer orientation * Excellent interpersonal and organizational skills * Attention to detail and focus on quality * Strong communication skills to effectively liaise with both technical and non-technical staff * Ability to act decisively and work well under pressure * Must be a collaborative problem solver * Strong bias for ownership and action Qualifications: * At least 10 + years of experience designing, building, and maintaining SAAS environments * 6+ years of experience designing, building and maintaining AWS/AZURE infrastructure with Terraform * 3+ years of experience building and running Kubernetes, Clickhouse, MySQL, and Kafka clusters * Experience with observability (monitoring - logging, tracing, metrics) * Experience with GitOps CI/CD processes * Experience with scripting with Python, Go (Golang), bash, or PowerShell and AWS CLI tools * Experience with security operations - security policies, infrastructure, key management, setup of encryption at rest, and transport

Solarwinds Corp.

Staff Site Reliability Engineer

Austin, TX, US

Summary

How strong is your resume?

How strong is your resume?

MORE JOBS LIKE THIS

People also searched:

Our Company

Career Guides

Career Advice

Support