Apple Inc.

Sr. ML Infrastructure Engineer, Apple Data Platform

Cupertino, CA, US

25 days ago
Save Job

Summary

The Apple Data Platform (ADP) group builds the data platform that enables the next generation of intelligent experiences on all Apple products and services. ADP empowers Apple engineers to deliver ML-driven products and innovations rapidly and at scale. We are looking for an experienced engineer who can bring their passion for machine learning, infrastructure, big data, and distributed systems to build world class data+ML platform/products at scale. You will work with many cross functional teams and lead the planning, execution and success of technical projects with the ultimate purpose of improving ML experience for Apple customers. Are you a passionate about building scalable, reliable, maintainable infrastructure and solving data problems at scale? Come join us and be part of the Data Infrastructure journey.Apple Ray leverages open-source Ray to offer a unified framework for processing of complex data+ML pipelines. It enables the next generation of intelligent experiences for Apple products and services by combining data and processing layers into one unified end-to-end workflow that eliminates the complexity of running multiple independent jobs while significantly improving the hardware resource efficiency and development speed. Tight integration of Apple Ray with Apple Data services makes it the go-to solution when dealing with complex and large-scale data and ML pipelines. The team enables future Apple intelligent products by making cutting edge ecosystem of data+ML technologies for large-scale and efficient systems for all data and ML engineers within Apple. As a member of the Apple Ray team, your responsibilities will include: * Designing, implementing, and maintaining distributed systems to build world-class ML platforms/products at scale * Diagnose, fix, improve, and automate complex issues across the entire stack to ensure maximum uptime and performance * Design and extend services to improve functionality and reliability of the platform * Monitor system performance, optimize for cost and efficiency, and resolve any issues that arise Build relationships with stakeholders across the organization to better understand internal customer needs and enhance our product better for end usersUnderstanding of the ML lifecycle and state of the art ML Infrastructure technologies Experience with GPU and other type of HPC infrastructure Experience with training framework like PyTorch, Tensorflow, JAX Deep understanding of Ray and KubeRay Experience with ML Training/Inference profiling and optimizationArray

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job