Senior Sofware Engineer (ML Training Infrastructure)
Inflection AI is a public benefit corporation leveraging our world class large language model to build the first AI platform focused on the needs of the enterprise.
Who we are:
Inflection AI was re-founded in March of 2024 and our leadership team has assembled a team of kind, innovative, and collaborative individuals focused on building enterprise AI solutions. We are an organization passionate about what we are building, enjoy working together and strive to hire people with diverse backgrounds and experience.
Our first product, Pi, provides an empathetic and conversational chatbot. Pi is a public instance of building from our 350B+ frontier model with our sophisticated fine-tuning (10M+ examples), inference, and orchestration platform. We are now focusing on building new systems that directly support the needs of enterprise customers using this same approach.
Want to work with us? Have questions? Learn more below.
About the Role
As a Senior Software Engineer on the ML Training Infrastructure team, you’ll design and operate the systems that power large-scale machine learning workflows—from model training through to production deployment. You'll develop control planes, manage distributed compute clusters, and build the tooling that ensures our platform remains reliable, secure, and highly scalable. We're looking for engineers with hands-on experience running ML infrastructure in production and a strong open-source mindset.
This role is a strong fit if you:
In this role, you will:
ML infrastructure, Kubernetes, SLURM, Ray, distributed systems, model training, production ML, control planes, ML tooling, orchestration, infrastructure security, scalable systems, AI workflows, open-source infrastructure, machine learning deployment