AWS Cloud Management : Minimum 5 years of hands-on experience in deploying and managing AWS services such as EC2, VPC, RDS, S3, ALB, Route 53, API Gateway, and CloudFront in production environments. AWS Certified Solutions Architect - Associate (SAA) or Solutions Architect - Professional (SAP) is preferred.
Infrastructure as Code (IaC) : Minimum 4 years of experience with Terraform for provisioning, managing, and scaling cloud infrastructure.
HashiCorp Certified : Terraform Authoring and Operations Professional or HashiCorp Certified : Terraform Associate (003) is preferred.
- Linux Administration : Proficiency in Linux system administration, including patching, security hardening, and performance tuning in cloud environments.-
Observability & Monitoring : Hands-on experience with CloudWatch, New Relic, OpenSearch, Sumo Logic, or similar tools for log analysis, monitoring, and troubleshooting.
Scripting & Automation : Proficiency in Python, Shell, or PowerShell for automating cloud infrastructure tasks, deployments, and operational workflows.
Cloud Networking & Security : Strong understanding of VPC networking, VPNs, load balancers, and security best practices, including WAFs (Web Application Firewalls) and endpoint protection tools.
Site Reliability Engineering (SRE) Practices : Experience with SLA, SLI, and SLO, incident management, and ensuring system reliability for production workloads.
Disaster Recovery & Resilience : Experience conducting DR and BCP tests for AWS workloads and implementing high-availability architectures.
Roles & Responsibilities
AWS Cloud Administration : Own the troubleshooting, administration, and optimization of AWS environments, applications, databases, and servers.
Infrastructure as Code (IaC) : Design, implement, and maintain cloud infrastructure using Terraform.
Compliance & Security Audits : Support the implementation and auditing of security and compliance frameworks such as SOC 2, StateRAMP, and FedRAMP.
Automation & Scripting : Develop and optimize automation scripts for infrastructure provisioning, patching, and operational tasks.
Production Readiness & Reliability : Conduct regular DR tests, enforce reliability best practices, and optimize cloud resource utilization.
Incident & Change Management : Lead incident response for production issues, perform root cause analysis, and collaborate with teams to implement preventive measures.
On-Call & Maintenance : Participate in a 24/7 On-Call rotation (one week per month) and monthly patching activities on one Saturday per month (compensatory time off provided).
(ref:hirist.tech)
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job