Join Tesla's Dojo Performance Team to design and optimize cutting-edge system-level simulation frameworks for AI accelerators. You will simulate the performance of thousands of Dojo compute nodes operating in parallel to drive state-of-the-art machine learning (ML) workloads. This role centers on modeling large-scale AI training systems, to evaluate performance of new kernels and mapping strategies. By analyzing trade-offs between memory, compute, and communication across system resources, you will help push the boundaries of AI performance and efficiency.
* Develop system-level simulation frameworks to model the performance of massively parallel AI accelerators, including compute distribution, memory hierarchy, interconnects, and dataflow
* Simulate and analyze how large-scale ML workloads, from FSD to LLMs, are mapped and executed across thousands of Dojo compute nodes
* Collaborate with ML architects, kernel developers, and system engineers to ensure simulations reflect real-world AI training requirements
* Design and implement tests to evaluate trade-offs in system resources, including memory bandwidth, capacity, latency, and compute, to optimize performance for large-scale AI workloads
* Build and maintain software tools and frameworks to support simulation development, testing, and integration
* Conduct performance analysis to identify bottlenecks and propose system-level optimizations
* Stay current with advancements in ML model architectures, parallel computing, and system-level simulation techniques
* Participate in code reviews, debugging, and testing to ensure robust and scalable simulation frameworks
* Degree in Computer Science, Electrical Engineering, or proof of exceptional skills in related fields, or equivalent experience
* Strong proficiency in C++ for developing high-performance simulation frameworks
* Solid understanding of ML/deep learning model architectures, including how models are partitioned and mapped across multiple devices. Good understanding in Compute Architecture, Memory Hierarchy, and Dataflows
* Experience in system-level simulation, parallel computing, or ML workload optimization
* Knowledge of kernel development processes and how ML workloads are deployed on hardware accelerators
* Familiarity with analytical simulation techniques for modeling high-level system behavior
* Excellent problem-solving skills, with the ability to analyze complex systems and propose innovative solutions
* Strong communication and collaboration skills to work effectively with cross-functional teams, including ML researchers, architects, and engineers
* Ability to work onsite in our Palo Alto, CA office