As a member of the Dojo Machine Learning team, you will be responsible for developing and optimizing simulations of the architecture of a massively parallel machine for AI training. The ideal candidate will have a strong background in computer architecture, analytical and cycle-based simulation, and AI workloads, with a passion for delivering high-performance simulations that accurately model complex systems.
* Develop and validate simulations of the architecture of a massively parallel machine for AI training, including system architecture, core architecture, memory hierarchy, and interconnects
* Collaborate with architects and engineers to understand the requirements of the simulation and ensure that it accurately models the behavior of the system
* Develop and maintain software frameworks and tools to support simulation development, testing, and deployment
* Work closely with the team to ensure seamless integration of simulations with other components of the system
* Participate in code reviews, testing, and debugging to ensure high-quality software
* Stay up-to-date with the latest developments in AI workloads, computer architecture, and simulation techniques
* Degree in Engineering, Computer Science, or equivalent in experience and evidence of exceptional ability
* 3+ years of experience in simulation development, computer architecture, and AI workloads
* Experience with analytical and cycle-based simulation techniques, including modeling of complex systems and validation of simulation results
* Strong programming skills in languages such as C++ and Python
* Experience with deep learning frameworks such as PyTorch and JAX
* Strong understanding of CPU and/or GPU microarchitecture, including pipelining, caching, and memory hierarchy
* Excellent problem-solving skills, with the ability to analyze complex problems and develop creative solutions
* Strong communication and collaboration skills, with the ability to work effectively with Architects, Engineers, and Researchers
* Able to work from Palo Alto office