Our Ai/ML engineering team ensures Code and Theory delivers innovative, immersive web experiences that delight our clients and their customers. We are always striving to balance the demanding nature of working on cutting-edge technologies with the real-world demands of high performance, high security, and accessibility. Working in collaboration with our multi-disciplinary engineering, design, and quality assurance teams, you will build software that solves real-world problems for incredible clients.
We are seeking an experienced Lead ML+DevOps Engineer. The ideal candidate will have strong expertise in cloud deployment, containerization, and related technologies, and will play a crucial role in the scalability and reliability of our AI/ML infrastructure. You’ll be in a high-visibility role, working with all sorts of clients both internally and externally to deliver scalable, precise, and – most importantly – interesting machine learning solutions. Our work stretches from audience segmentation to dynamic content generation, from spell-checking to large language modeling and beyond.
WHAT YOU’LL DO
- Design and implement end-to-end MLOps pipelines.
- Deep expertise in managing the full ML lifecycle, including data management, model versioning, experiment tracking, rigorous model testing and monitoring (beyond standard application testing), and complete automation of the ML pipeline.
- Configure and manage cloud-based resources (e.g., AWS, GCP, Azure) to support AI/ML workloads, leveraging containerization as a component within a broader MLOps strategy.
- Automate the entire ML deployment and management process, including model building, testing, and monitoring, through specialized MLOps scripts and tools.
- Collaborate with data scientists and engineers to understand their unique ML requirements and develop tailored MLOps solutions addressing data, model, and deployment complexities.
- Monitor and optimize AI/ML infrastructure performance, focusing on model-specific metrics and potential bottlenecks in the ML pipeline.
- Stay up-to-date with industry trends and best practices in MLOps, applying this knowledge to continuously improve our organization's capabilities in managing the full ML lifecycle.
WHAT YOU’LL NEED
- Extensive experience designing, building, and deploying end-to-end MLOps pipelines in cloud environments, encompassing data management, model development, deployment, and monitoring (beyond just basic model deployment).
- Strong expertise in containerization (Docker) and orchestration (e.g., Kubernetes, ECS) within the context of deploying and managing ML workloads.
- Proficiency in Infrastructure as Code (IaC) using Terraform for managing cloud resources, specifically tailored for AI/ML infrastructure.
- Hands-on experience with streaming data platforms (e.g., Kafka, Kinesis) for real-time feature engineering and model serving.
- Solid understanding of data engineering principles, including data cleaning, transformation, feature engineering, and ETL/ELT processes relevant to ML datasets.
- Proven experience building and managing CI/CD/CT (Continuous Training) pipelines using tools like Jenkins or GitLab CI, specifically for automated model building, testing, and deployment.
- Strong programming skills in Python, with a solid understanding of ML workflows and familiarity with ML frameworks (e.g., TensorFlow, PyTorch) for model development and deployment.
- Excellent problem-solving skills with a focus on the unique challenges of deploying and maintaining machine learning systems in production (e.g., model drift, performance degradation).
- Strong communication skills with the ability to effectively convey technical MLOps concepts to both technical and non-technical stakeholders, including data scientists and business teams.
ABOUT US
Born in 2001, Code and Theory is a digital-first creative agency that sits at the center of creativity and technology. We pride ourselves on not only solving consumer and business problems, but also helping to establish new capabilities for our clients. With a global client roster of Fortune 100s and start-ups alike, we crave the hardest problems to solve. With a remote-first approach to our people, we have teams distributed across North America, South America, Europe, and Asia. The Code and Theory global network of agencies is growing and includes Kettle, Instrument, Left Field Labs, Mediacurrent, Rhythm, and TrueLogic.
Striving never to be pigeonholed, we work across every major category: from tech to CPG, financial services to travel & hospitality, government and education to media and publishing. We value the collaboration with our client partners, including but not limited to Adidas, Amazon, Con Edison, Diageo, EY, J.P. Morgan Chase, Lenovo, Marriott, Mars, Microsoft, Thomson Reuters, and TikTok.
The Code and Theory network comprises nearly 2,000 people with 50% engineers and 50% creative talent. We’re always on the lookout for smart, driven, and forward-thinking people to join our team.