Inflection AI

Senior Software Engineer (GPU Performance)

Palo Alto, CA, US

16 days ago
Save Job

Summary

Senior Software Engineer (GPU Performance)


Inflection AI is a public benefit corporation leveraging our world class large language model to build the first AI platform focused on the needs of the enterprise.


Who we are:

Inflection AI was re-founded in March of 2024 and our leadership team has assembled a team of kind, innovative, and collaborative individuals focused on building enterprise AI solutions. We are an organization passionate about what we are building, enjoy working together and strive to hire people with diverse backgrounds and experience.


Our first product, Pi, provides an empathetic and conversational chatbot. Pi is a public instance of building from our 350B+ frontier model with our sophisticated fine-tuning (10M+ examples), inference, and orchestration platform. We are now focusing on building new systems that directly support the needs of enterprise customers using this same approach.

Want to work with us? Have questions? Learn more below.


About The Role


As a Member of Technical Staff, Senior Software Engineer on our AI Systems team, you’ll focus on end-to-end performance optimization across the entire stack—not just compilers or GPU kernels. You'll work at the intersection of systems engineering and application orchestration, improving throughput, latency, and resource efficiency across our AI infrastructure. Your work will directly impact how we deliver scalable, high-performance solutions in demanding enterprise environments.


This role is right for you if you:

  • Have hands-on experience optimizing performance across the stack—from low-level system code to high-level orchestration.
  • Are skilled in C/C++ and comfortable working within higher-level frameworks to solve end-to-end performance challenges.
  • Understand GPU programming and acceleration techniques (e.g., CUDA), but also take a system-wide view of performance.
  • Work well in fast-moving environments where cross-disciplinary collaboration and speed matter.
  • Enjoy identifying and solving bottlenecks alongside researchers and engineers to push the boundaries of AI system performance.


In this role, you will:

  • Drive system-wide performance improvements across core services, orchestration layers, and ML applications.
  • Partner with engineering and research teams to identify inefficiencies and deliver targeted optimizations.
  • Evaluate and integrate emerging tools and frameworks to maintain best-in-class system scalability and throughput.
  • Lead performance-focused initiatives that impact deployment, inference, and resource utilization across the platform.
  • Play a key role in shaping our technical roadmap, ensuring our systems meet the demands of enterprise-grade AI workloads.


GPU performance engineering, systems optimization, CUDA, C++, AI infrastructure, orchestration, system-level programming, inference acceleration, deployment pipelines, enterprise AI, performance tuning, scalability, hardware utilization, ML systems, cross-stack engineering

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: