Site Reliability Engineer

Bengaluru, KA, IN

7 days ago
Save Job

Summary

About Media.net :

Media.net is a leading, global ad tech company that focuses on creating the most transparent and efficient path for advertiser budgets to become publisher revenue. Our proprietary contextual technology is at the forefront of enhancing Programmatic buying, the latest industry standard in ad buying for digital platforms.


The Media.net platform powers major global publishers and ad-tech businesses at scale across ad formats like display, video, mobile, native, as well as search. Media.net’s U.S. HQ is based in New York, and the Global HQ is in Dubai. With office locations and consultant partners across the world, Media.net takes pride in the value-add it offers to its 50+ demand and 21K+ publisher partners, in terms of both products and services.


Responsibilities (What You’ll Do)

Infrastructure Management:

  • Oversee and maintain the infrastructure that supports the ad exchange applications. This includes load balancers, data stores, CI/CD pipelines, and monitoring stacks.
  • Continuously improve infrastructure resilience, scalability, and efficiency to meet the demands of massive request volume and stringent latency requirements.
  • Developing policies and procedures that improve overall platform stability and participate in shared On-call schedule

Collaboration with Developers:

  • Work closely with developers to establish and uphold quality and performance benchmarks, ensuring that applications meet necessary criteria before they are deployed to production.
  • Participate in design reviews and provide feedback on infrastructure-related aspects to improve system performance and reliability.

Building Tools for Infra Management:

  • Develop tools to simplify and enhance infrastructure management, automate processes, and improve operational efficiency.
  • These tools may address areas such as monitoring, alerting, deployment automation, and failure detection and recovery, which are critical in minimizing latency and maintaining uptime.

Performance Optimization:

  • Focus on reducing latency and maximizing efficiency across all components, from request handling in load balancers to database optimization.
  • Implement best practices and tools for performance monitoring, including real-time analysis and response mechanisms.

Who Should Apply

  • B.Tech/M.Tech or equivalent in Computer Science, Information Technology, or a related field.
  • 2–4 years of experience managing services in large-scale distributed systems.
  • Strong understanding of networking concepts (e.g., TCP/IP, routing, SDN) and modern software architectures.
  • Proficiency in programming and scripting languages such as Python, Go, or Ruby, with a focus on automation.
  • Experience with container orchestration tools like Kubernetes and virtualization platforms (preferably GCP).
  • Ability to independently own problem statements, manage priorities, and drive solutions.


Preferred Skills & Tools Expertise:

  • Infrastructure as Code: Experience with Terraform.
  • Configuration management tools like Nix, Ansible.
  • Monitoring and Logging Tools: Expertise with Prometheus, Grafana, or ELK stack.
  • OLAP databases: Clickhouse and Apache druid.
  • CI/CD Pipelines: Hands-on experience with Jenkins, or ArgoCD.
  • Databases: Proficiency in MySQL (relational) or Redis (NoSQL).
  • Load Balancers Servers: Familiarity with haproxy or Nginx.
  • Strong knowledge of operating systems and networking fundamentals.
  • Experience with version control systems such as Git.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: