Access Data Consulting Corporation

Site Reliability Engineer

Denver, CO, US

19 days ago
Save Job

Summary

Job Title: Site Reliability Engineer

Location: Denver, CO (Hybrid or On-site)

Department: Engineering / Infrastructure

Reports To: Senior Manager, Systems Reliability Engineering

NO 3rd Party Vendors/Or Referrals

Job Summary

We are seeking a Site Reliability Engineer (SRE) to join our growing infrastructure and observability team in Denver, CO. This engineering role plays a key part in designing, implementing, and maintaining system monitoring and observability solutions for a highly complex enterprise network. The ideal candidate will work closely with software developers, network engineers, and platform teams to ensure full-stack observability—including APM, NPM, SNMP monitoring, log aggregation, and JVM metrics.

This is a collaborative and hands-on role requiring both technical depth and cross-functional communication skills. The successful candidate will also support automation efforts, revision control, and WiFi monitoring initiatives to ensure optimal system performance and reliability.

Major Duties and Responsibilities

  • Design, implement, enhance, and troubleshoot observability artifacts across cloud and on-premise infrastructure.
  • Collaborate with developers to integrate observability into applications and services.
  • Build and maintain monitoring dashboards in Splunk, Datadog, and Grafana.
  • Develop observability strategies based on architecture documents and release notes for services and WiFi hardware.
  • Deploy and maintain APM, NPM, SNMP, and JVM monitoring tools to support system performance and reliability.
  • Automate monitoring and deployment processes using BASH and Python.
  • Maintain accurate and clear documentation of release notes and observability implementations.
  • Use Git for revision control and contribute to collaborative development workflows.
  • Support and contribute to network and wireless performance analysis initiatives.

Required Qualifications

Skills / Abilities / Knowledge

  • Solid understanding of network design, architecture, protocols, and the OSI model.
  • Proficiency with TCP/IP, SNMP, and basic networking appliances.
  • Strong working knowledge of Linux/Unix environments (RHEL, Ubuntu, SUSE, Rocky Linux).
  • Familiarity with AWS cloud infrastructure and deployment methodologies.
  • Proficient with APM suites (Datadog, Grafana) and log aggregation tools (Splunk, Loki).
  • Experience in JVM monitoring and network device monitoring.
  • Proficient in scripting with BASH and Python for automation tasks.
  • Understanding of WiFi monitoring and performance analysis.
  • Skilled in writing clear technical documentation.
  • Comfortable using ticketing systems and collaborating in fast-paced environments.

Education

  • Bachelor’s or Master’s Degree in Engineering, Computer Science, or related field—or equivalent relevant work experience.

Experience

  • 5+ years of experience in a Site Reliability Engineering or similar technical engineering role.
  • 5+ years of hands-on experience with wireless networking and performance analysis.
  • 5+ years of experience managing technical projects and working in cross-functional teams.

Preferred Qualifications

  • Industry certifications related to networking, AWS, or observability (e.g., AWS Certified SysOps Admin, CCNA, Datadog Certified).
  • Experience with container orchestration (e.g., Kubernetes) and infrastructure as code tools (e.g., Terraform, Ansible).
  • Prior experience working in Agile environments.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: