About US: We are a company that provides innovative, transformative IT services and solutions. We are passionate about helping our clients achieve their goals and exceed their expectations. We strive to provide the best possible experience for our clients and employees. We are committed to continuous improvement and innovation, and we are always looking for ways to improve our services and solutions. We believe in working collaboratively with our clients and employees to achieve success.
DS Technologies Inc is looking for Full-Time Site Reliability Engineer (SRE) role for one of our premier clients.
Job Title:Full-Time Site Reliability Engineer (SRE)
Location: Mountain View, CA (Hybrid)
Industry: Technology / Software Development
Job Category: Site Reliability Engineering (SRE)
Overview
We are seeking a Site Reliability Engineer (SRE) to build and manage scalable, reliable, and high-performance systems for Intuit's services. This role focuses on automation, monitoring, troubleshooting, and ensuring system uptime and reliability.
Position: Site Reliability Engineer (Full-Time)
Responsibilities
Build and manage scalable and high-performance systems for Intuit's services
Ensure uptime and reliability of critical production systems
Troubleshoot and resolve incidents in collaboration with the engineering team
Implement automation tools and processes to streamline operations
Design and develop solutions for monitoring, alerting, and reporting of system performance
Analyze system logs and metrics to optimize operational efficiency
Work with development teams to enforce best practices for reliability and scalability
Participate in incident management, root cause analysis, and post-mortem reviews
Ensure security best practices are followed in system design and deployment
Improve infrastructure monitoring, deployment processes, and overall operational efficiency
Required Qualifications
5+ years of experience in Site Reliability Engineering or a related field
Strong experience with cloud platforms (AWS, Azure, or GCP)
Proficiency in Python, Go, or Java for automation and system development
Solid knowledge of CI/CD pipelines and automation tools
Experience with containerization technologies (Docker, Kubernetes)
Strong knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
Hands-on experience with incident management, root cause analysis, and troubleshooting
Ability to work in a fast-paced, dynamic environment
Candidate Details
Must have strong cloud infrastructure and automation experience
Ability to collaborate with cross-functional teams to enhance system reliability
Strong problem-solving skills with a focus on scalability and performance optimization
If you are interested, Kindly share your resume to