Infraveo Technologies

Software Development Engineer - Cloud Infrastructure

India

15 days ago
Save Job

Summary

This is a remote position.

We are seeking a Software Development Engineer - Cloud Infrastructure to join our team with expertise in cloud infrastructure, Big Data and web crawling technologies. This role bridges system reliability engineering with scalable data extraction solutions, ensuring our infrastructure remains robust and capable of handling high-volume data collection. You will design resilient systems, optimize automation pipelines, and tackle challenges posed by advanced bot-detection mechanisms.

Responsibilities:
  • Architect, deploy, and manage scalable cloud environments (AWS/GCP/DO) to support distributed data processing solutions to handle terabyte-scale datasets and billions of records efficiently.
  • Automate infrastructure provisioning, monitoring, and disaster recovery using tools like Terraform, Kubernetes, and Prometheus.
  • Optimize CI/CD pipelines to ensure seamless deployment of web scraping workflows and infrastructure updates.
  • Develop and maintain stealthy web scrapers using Puppeteer, Playwright, and headless chromium browsers.
  • Reverse-engineer bot-detection mechanisms (e.g., TLS fingerprinting, CAPTCHA solving) and implement evasion strategies.
  • Monitor system health, troubleshoot bottlenecks, and ensure 99.99% uptime for data collection and processing pipelines.
  • Implement security best practices for cloud infrastructure, including intrusion detection, data encryption, and compliance audits.
  • Partner with data collection, ML and SaaS teams to align infrastructure scalability with evolving data needs.
  • Research emerging technologies to stay ahead of anti-bot trends including technologies like Kasada, PerimeterX, Akamai, Cloudflare, and more.


Requirements

  • 4-6 years of experience in site reliability engineering and cloud infrastructure management.
  • Proficiency in Python, JavaScript for scripting and automation.
  • Hands-on experience with Puppeteer/Playwright, headless browsers, and anti-bot evasion techniques.
  • Knowledge of networking protocols, TLS fingerprinting, and CAPTCHA-solving frameworks.
  • Experience with monitoring and observability tools such as Grafana, Prometheus, Elasticsearch, and familiarity with monitoring and optimizing resource utilization in distributed systems.
  • Experience with data lake architectures and optimizing storage using formats such as Parquet, Avro, or ORC.
  • Strong proficiency in cloud platforms (AWS, GCP, or Azure) and containerization/orchestration (Docker, Kubernetes).
  • Deep understanding of infrastructure-as-code tools (Terraform, Ansible) .
  • Deep experience in designing resilient data systems with a focus on fault tolerance, data replication, and disaster recovery strategies in distributed environments.
  • Experience implementing observability frameworks, distributed tracing, and real-time monitoring tools.
  • Excellent problem-solving abilities, with a collaborative mindset and strong communication skills.


Benefits

  • Work Location: Remote
  • 5 days working

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job