Simplismart

Cloud Infrastructure Engineer

Bengaluru, KA, IN

3 months ago
Save Job

Summary

About Simplismart

A bit about our product - Simplismart is an MLOps platform with 3 major suites:

  • Training suite: Assemble and train any model, including LLMs, vision, audio, tabular, and tree models.
  • Deployment suite: Most companies fail to make models production-ready. Our proprietary model deployment suite is 6x faster than HuggingFace’s enterprise suite and 12x faster than replicate.ai. Users can easily deploy (auto-scale) models trained on Simplismart (more optimised), import any model from HuggingFace, or even a Pytorch/Tensorflow artefact: Tensorflow, Pytorch, ONNX, JAX.
  • Observability suite: Monitor model health, including load, latency, uptime, data drift, and concept drift.

Position Overview

As a Cloud Engineer, you will contribute to building a highly available, global, multi-cloud PaaS platform using open-source technologies to support Simplismart’s rapid growth. This system encompasses diverse environments (Kubernetes, VMs, bare metal compute) and provides a cohesive and reliable abstraction for running AI workloads. You will be able to work with cutting-edge technologies and solve complex problems.

To be successful in this role, you need to be deeply technical, possess strong communication and collaboration skills, and have experience in infrastructure-as-code. Proficiency with tools like Terraform and Ansible and strong software development fundamentals is essential. Additionally, you should have a good understanding of systems knowledge and troubleshooting abilities.

Requirements

  • 5+ years of experience writing high-performance, well-tested, production-quality code and platform engineering.
  • Proficiency in at least one backend programming language (Python desired; C++ is a plus)
  • Demonstrated experience with high-performance or distributed cloud microservices architectures.
  • Ideally, you should have experience building and operating globally using multiple cloud providers such as AWS, Azure, or GCP.
  • A good understanding of low-level operating systems concepts, including multi-threading, memory management, networking and storage, performance, and scale.
  • Pragmatic, methodical, well-organized, detail-oriented, and self-starting.
  • Experience with Kubernetes, containerization, Terraform and Ansible.
  • Experience with Pytorch or Tensorflow is a plus. (not necessary)
  • Knowledge of GPU programming, NCCL and CUDA is a plus.

Responsibilities

  • Designing the high-level architecture of the MLOps platform from the ground up.
  • Handling formalisation of diverse GPU-based workloads.
  • Developing a robust internal system for continuous deployment of various services and modules in diverse environments.
  • Create frameworks for reliable and fault tolerant systems for mission-critical workloads.

Skills And Attributes

  • Deep technical expertise.
  • Strong communication and collaboration skills.
  • Experience in infrastructure-as-code (Terraform, Ansible).
  • Strong software development fundamentals.
  • Good systems knowledge and troubleshooting abilities.
  • Ability to work independently and as part of a team.
  • Proactive and self-motivated.

Why should you join SimpliSmart?

Well, let's break away from the conventional perks and instead focus on what you WON’T experience here:

  • Legacy System Headaches: You won't have to endlessly grapple with outdated legacy systems that hinder your productivity and creativity.
  • Bossy Culture: At SimpliSmart, we believe in collaboration and empowerment, not hierarchy. You won't have a boss breathing down your neck but instead, colleagues who support your growth.
  • Dark Circles: Late nights and overwork are not the norm here. We prioritize work-life balance, ensuring you won't be sporting those tired, dark circles under your eyes.
  • Stagnation: Say goodbye to redundant and stagnant tasks. We thrive on innovation and dynamic challenges that keep you engaged and motivated.

Skills: infrastructure,prometheus,grafana,terraform,kubernetes

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: