Sky Systems, Inc. (SkySys)

Senior Site Reliability Engineer

Costa Rica

5 days ago
Save Job

Summary

Role: Senior Site Reliability Engineer (SRE)

Position Type: Full-Time Contract (40hrs/week)

Contract Duration: 12 months+

Work Schedule: 8 hours/day (Mon-Fri)

Work Timezone: US Time

Location: 100% Remote


We're looking for a Senior Site Reliability Engineer (SRE) to join our Innovation Team, where we're building the next generation of AI-powered SaaS solutions. This is a high-impact, hands-on role supporting a fast-moving, multidisciplinary engineering team—including Angular developers, Node.js engineers, and data scientists working with OpenAI and agentic AI architectures.

You'll play a critical role in ensuring our infrastructure is scalable, resilient, observable, and automated to support production-grade applications and machine learning workloads.


Must-Have Qualifications:

  • 5+ years in SRE, DevOps, or Cloud Infrastructure roles supporting production environments
  • Advanced expertise in Microsoft Azure (compute, networking, identity, and security)
  • Strong experience with CI/CD using Azure DevOps and GitHub Actions
  • Infrastructure as Code skills using Terraform
  • Proficient in Python (scripting/automation) and comfortable with Node.js
  • In-depth knowledge of Docker, Helm, Flux, AKS, and containerized architectures
  • Production experience managing and scaling MongoDB
  • Familiar with Databricks and ML pipeline operations
  • Hands-on experience with Dynatrace for observability and monitoring
  • Exposure to AI/LLM-based production workloads (e.g., OpenAI APIs, agentic AI systems)
  • Willingness to provide after-hours and weekend support

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: