What To Expect
Tesla is looking for a Site Reliability Engineer to build, enhance, and scale the infrastructure that underpins our Energy IoT applications. These applications provide real-time monitoring, optimization, and control for Tesla’s industry-leading energy products, including Powerwall, Megapack, Solar Roof, Supercharger, Wall Connector, Autobidder, and Virtual Power Plants.
We are a high-impact team that values curiosity, learning, mentorship, open discourse, and making disciplined decisions by weighing trade-offs. Our work supports over 50 engineers and directly affects millions of customers.
If you enjoy thinking in systems and tackling challenges related to the availability, reliability, scalability, and security of distributed software, this role is for you!
You’ll work with and deepen your expertise in Linux, Networking, Kubernetes, on-premises data centers, AWS, Terraform, Prometheus, Helm, GitHub Actions, PostgreSQL, Kafka, InfluxDB, Scala, and Rust.
Join us in accelerating the world’s transition to sustainable energy!
What You'll Do
Envision and implement changes that improve system reliabilityConduct deep investigations into new technologies and resolve unexpected issues that arise during operationProvide guidance on system architecture and security best practicesReview, digest, and distill complex code and technical topics to ensure clarity and accessibility for all engineersProvide technical leadership, foster collaboration, and drive key initiatives to completionUphold team values, including engineering excellence, curiosity, bias for action, self-awareness, inclusivity, and openness
What You'll Bring
Minimum 2+ years of relevant industry experienceExperience in developing, scaling, and maintaining infrastructure for distributed systems, including IoT applicationsProficiency in many of the following: Linux, Networking, Kubernetes, on-premises data centers, AWS, Terraform, Prometheus, Helm, GitHub Actions, PostgreSQL, and KafkaStrong understanding of system design principles and the challenges of ensuring availability, reliability, scalability, and security in distributed software systemsEffective verbal and written communication skillsAbility to navigate uncertainty and loosely defined problem statementsStrong analytical and problem-solving skills, with the ability to evaluate trade-offs and make well-reasoned decisionsCollaborative mindset with a willingness to learn, mentor, and engage in open discussions
, Tesla