Shrive Technologies

Site Reliability Engineer/SRE

Indianapolis, IN, US

about 1 month ago
Save Job

Summary

Title :Site Reliability Engineer/SRE

Location : Indiana Remote

50% support/Operations

    • Runtime production operations support Sev 0 & Sev 1
    • "Super T shaped" role that can float between squads with focus on Continuous Process Improvement
50% Development engineering

    • Automation of repetitive tasks
    • SREs are focused on building and monitoring anything in production that improves service resiliency
Job Description

  • Possess hands on experience in various stages of IT Infrastructure management Lifecycle.
  • Experience in Client relationship, Service Integration, Team building, Process and People Management.
  • Experience in successfully managing cloud operations and resources to deliver Client Satisfaction.
  • Experience building, integrating, deploying and provisioning cloud services
  • IaaC: Implemented large scale infrastructure using Cloud ARM / CF / Terraform Templates
  • Experienced in scripting languages such as PowerShell, Python and Shell
  • Experience with configuration management tools (Chef, Puppet, Ansible)
  • Experience with Collaboration tools such as Atlassian (Jira, Confluence)
  • Successfully governed DC consolidation and migration Projects
  • Optimization of on-premise and cloud infrastructure and participate in design reviews
  • Led multiple implementations of infrastructure monitoring using native monitoring, and third-party tools
  • Capacity planning and management create, use, maintain a capacity model for on-prem and Cloud workloads
  • Certified in Cloud Architecture, Operations and Engineering
  • Certified in ITIL and project management

Responsibilities

  • Resolve critical and complex technical issues in a global support delivery team. Combine technical expertise and customer requirements to solve complex business challenges.
  • Quickly identify customer issues WRT Cloud services; and being able to conduct in-depth diagnostics on Cloud platform and services.
  • Perform RCA of critical incidents. Analyze and eliminate top issues impacting customer experience.
  • Create documentation (SOP's & TSG's) to help L1/L2 teams to support operations.
  • Work with leadership on process improvement and strategic initiatives
  • Serve as the SME for selecting technology candidates and self healing capabilities for future service development
  • Perform large scale automation, combining independent processes into robust behavior

Control Points

  • Provide Architectural Reviews and Signoffs on a Service based on ability to achieve availability targets
  • Accept or reject services based on their ability to achieve SLAs
  • Validate scalability testing results, and test limits of hardware and software

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: