Splunk

Senior Site Reliability Engineer - Observability

Bengaluru, KA, IN

4 days ago
Save Job

Summary

Join us as we pursue our ground-breaking vision to make machine data accessible, usable, and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we are committed to our work, customers, having fun, and most significantly to each other’s success.

The Splunk Observability Cloud provides full-fidelity monitoring and fixing across infrastructure, applications, and user interfaces, in real-time and at any scale, to help our customers keep their services reliable, innovate faster, and deliver great customer experiences. Infrastructure Software Engineers at Splunk are cloud-native systems engineers who use infrastructure-as-code, microservices, automation, and efficient design to build, operate, and scale our products.

About You

First and foremost, you have strong troubleshooting and problem resolution skills. You work well under pressure and have strong written and verbal communications skills. You pride yourself in being a self-starter who leads by example and has experience working in a rapidly changing environment. You also have

  • Minimum of a Bachelors degree in CSE, EE, CSM, or related technical discipline; MS degree desired
  • 9+ years of Site Reliability, DevOps, and/or Software Development experience, ideally in a growth-stage environment
  • Experience operating within, and supporting, complex SaaS production or revenue-critical 24/7 web services environments
  • Must have experience developing and operationalizing system installations and upgrades
  • Strong Experience with Unix/Linux system administration especially in RedHat Linux (Alma)
  • Experience running and administering services in AWS or other cloud platforms (Azure, GCP)
  • Significant experience with one or more scripting/coding languages, ideally with Terraform or Python
  • Experience with big data platform engineering
  • Experience with scaling and operationalizing distributed data stores, file systems, and services (Kafka, Elasticsearch, etc); familiarity with Lamdba architecture a big plus
  • Experience with virtualization and containerization platforms (Docker), container orchestration tools (Kubernetes) and aspects of Kubernetes to facilitate ease of delivery (Istio/Helm)
  • Availability for occasional on-call after-hours support

About The Role

  • Imagine a situation where you have hundreds of engineers build and pushing microservices into a highly available SaaS platform built on a cutting edge technology stack.
  • Imagine keeping that platform running and scaling it infinitely across multiple geolocations, cloud providers
  • Imagine leading a team of stellar DevOps engineers who are highly motivated to not just keep this platform running, but are constantly experimenting on how to make it better, what new technologies to adopt and how to continue to evolve the platform and roll out the infrastructural modernization without customers not even noticing a glitch
  • Imagine a world which is ready to shift left and is on the brink of a major cultural shift to a true DevOps model that world is waiting for YOU. Do you want to be that person? If yes, let's chat!

Day-to-day Responsibilities Include

  • Helping to build and infrastructure to facilitate rapid service deployments
  • Documenting findings and recommendations for improvement
  • Responsible helping lead full-stack platform infrastructure projects
  • Maintaining and enhancing deployment tools and methodologies; play a lead role in advancing our 'Infrastructure as code' architecture.
  • Lead the evaluation and development of our data ingestion pipeline to be deployed 'as a service'
  • Creating repeatable, efficient, and scalable artifact deployment pipelines
  • Making recommendations to, and interfacing with engineering to ensure 100% application uptime
  • Monitor the SaaS environment and work with QA, Developers, Ops to identify and solve problems
  • Ensure that failover mechanisms are in place and are working correctly
  • Responding to and resolving technical emergencies

Bachelors/Masters in Computer Science, Engineering, or related technical field, or equivalent practical experience.

We value diversity, equity, and inclusion at Splunk and are an equal employment opportunity employer. Qualified applicants receive consideration for employment without regard to race, religion, color, national origin, ancestry, sex, gender, gender identity, gender expression, sexual orientation, marital status, age, physical or mental disability or medical condition, genetic information, veteran status, or any other consideration made unlawful by federal, state, or local laws. We consider qualified applicants with criminal histories, consistent with legal requirements.

Note

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job