Serve as subject matter expert for distributed application systems that reside in hybrid cloud platforms.
Champion and drive operational improvements using insights from metrics and customer feedback.
Lead incident response and post-incident reviews.
Communicates complex topics with development teams to investigate and document issues and leads internal team to develop solutions to mitigate them
Manage and maintain enterprise applications and cloud-based systems using tools and frameworks designed for secure and scalable in-house deployments.
Monitor and optimize the health and performance of applications and platforms.
Debug problems reported by partners and end-users using in-depth log analysis and stack traces.
Create comprehensive documentation for operational procedures and environment setup.
Eliminate operational toil through automation or process improvements.
Be a member of a 24x7 shifting rotation.
Your Qualifications
Bachelor’s degree in any Information Technology or Engineering course.
Demonstrated ability in supporting critical production services and improving operations through automations and process enhancements.
Subject Matter Expert on the following subjects: Platform as a Service support, Distributed Systems and Microservices particularly on the fields of hosted services such as Content Delivery, Messaging, API gateways and proxies.
Strong communication skills, both written and verbal.
At least 5 years’ experience working with the following:
Linux Administration: RHEL, CentOS, or other Unix-like systems
Server and Infrastructure Troubleshooting: Hardware and OS Configuration
Logging and monitoring: Splunk, Grafana, Prometheus
Container Orchestration: Docker, Kubernetes
Incident management: PagerDuty, ServiceNow
Data serialization formats and structured systems: APIs, JSON, YAML
At least 3 years’ experience working with the following:
Distributed Application Support: Experience in supporting several applications running in microservices implementation.
Version Management and CICD: Git, Spinnaker
Infrastructure Config Management: Puppet, Ansible, Salt
Plus Points
Relevant certifications in any of the key skills (e.g. CKA or CKAD certified).
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job