TATA Consulting Services

SRE Architect

Marlboro, MA, US

Remote
Full-time
$110k–$125k/year
2 days ago
Save Job

Summary

1. Technical Expertise * Deep understanding of SRE principles, SRE model, and DevOps methodologies. * Experience designing highly available, scalable, and resilient distributed systems. * Proficient in architectural design (Microservices, Cloud-native, Event-driven architecture). * Skilled in cloud platforms: Azure, GCP. * Strong knowledge of observability tools: UIM, Prometheus, Grafana, Datadog, New Relic, Splunk, AppDynamics. 2. Framework Design & Governance * Define and validate SLOs, SLIs, SLAs, error budgets, and availability targets. * Design runbooks, escalation policies, and chaos testing frameworks. * Create reusable templates for observability, alerting, and logging. * Ensure compliance and audit readiness. 3. Communication & Cross-Functional Leadership * Collaborate with architects, designers, platform and infra teams. * Document frameworks and lead adoption across teams. * Review designs and validate reliability criteria. Roles & Responsibilities: 1. Framework & Standardization * Define and maintain the SRE operating model, framework, and onboarding guide. * Create templates and reference architectures for observability, alerting, and runbooks. * Standardize definitions of availability, reliability, latency, and performance. 2. Architectural Integration * Participate in application architecture reviews to validate SRE compliance. * Recommend design patterns for fault tolerance, failover, auto-scaling, and DR. * Define observability-by-design principles. 3. Governance, Audit & Optimization * Establish and lead SRE councils or review boards. * Define SRE maturity models, scorecards, and compliance checks. * Perform SRE audits across product portfolios. * Guide teams on capacity modeling, load distribution, and cost-efficiency strategies. * Collaborate with platform teams on resource reservations and right-sizing. 4. Tool Rationalization & Strategy * Evaluate and recommend standard SRE toolchains for monitoring, logging, tracing. * Own the integration strategy across observability platforms. 5. Training, Leadership & Evangelism * Conduct SRE bootcamps for application and infra teams. * Champion a blameless culture and continuous improvement mindset. * Drive Error Budget policies and reliability trade-off discussions. * Mentor product teams on SRE integration strategies. * Influence architectural decisions with SRE perspectives. #LI-RJ2 Salary Range-$110,000-$125,000 a year

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: