Key Responsibilities:
AI Production Deployment:
· Lead end-to-end transitions of AI PoCs into production environments, managing the entire process from testing to final deployment.
· Configure, install, and validate AI systems using key platforms, including:
VMware ESXi and vSphere for server virtualization, Linux (Ubuntu/RHEL) and Windows Server for operating system integration,
· Docker and Kubernetes for containerization and orchestration of AI workloads.
· Conduct comprehensive performance benchmarking and AI inferencing tests to validate system performance in production.
· Optimize deployed AI models for accuracy, performance, and scalability to ensure they meet production-level requirements and customer expectations.
Technical Expertise:
· Serve as the primary technical lead/SME for the AI POC deployment in enterprise environments, focusing on AI solutions powered by Nvidia GPUs.
· Work hands-on with Nvidia AI Enterprise and GPU-accelerated workloads, ensuring efficient deployment and model performance using frameworks such as PyTorch and TensorFlow.
· Lead technical optimizations aimed at resource efficiency, ensuring that models are deployed effectively within the customer’s infrastructure.
· Ensure the readiness of customer environments to handle, maintain, and scale AI solutions post-deployment.
· take ownership of AI project deployments, overseeing all phases from planning to final deployment, ensuring that timelines and deliverables are met.
· Collaborate with stakeholders, including cross-functional teams (e.g., Lenovo AI Application, solution architects), customers, and internal resources to coordinate deployments and deliver results on schedule.
· Implement risk management strategies and develop contingency plans to mitigate potential issues such as hardware failures, network bottlenecks, and software incompatibilities.
· Maintain ongoing, transparent communication with all relevant stakeholders, providing updates on project status and addressing any issues or changes in scope.
Knowledge Transfer and Documentation:
· Develop and deliver detailed documentation for each AI deployment, covering installation procedures, system configurations, and validation reports, ensuring operational teams have clear guidance on managing the deployed systems.
· Conduct post-deployment knowledge transfer sessions to educate client/ Lenovo Managed services teams on managing AI infrastructure, troubleshooting common issues, and optimizing AI models.
· Provide comprehensive training sessions on the operation, management, and scaling of AI systems, ensuring that customers are fully prepared for ongoing operations post-handoff.
Educational Background:
· Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience in AI infrastructure deployment.
Experience:
· Overall experience 7-10 years
· Relevant experience of 2-4 years in deploying AI/ML models/ AI solutions using Nvidia GPUs in enterprise production environments.
· Demonstrated success in leading and managing complex AI infrastructure projects, including PoC transitions to production at scale.
Technical Expertise:
· Extensive experience with Nvidia AI Enterprise, GPU-accelerated workloads, and AI/ML frameworks such as PyTorch and TensorFlow.
· Proficient in deploying AI solutions across enterprise platforms, including VMware ESXi, Docker, Kubernetes, and Linux (Ubuntu/RHEL) and Windows Server environments.
· MLOps proficiency with hands-on experience using tools such as Kubeflow, MLflow, or AWS SageMaker for managing the AI model lifecycle in production.
· Strong understanding of virtualization and containerization technologies to ensure robust and scalable deployments.
Certifications (Not all Reqiured):
· Certifications in Nvidia AI Enterprise, VMware, cloud integration, server platforms, machine learning, or data analytics are highly desirable.
· NVIDIA Certified Solutions Architect or related certification is advantageous.
Key Attributes:
· Adaptability: Ability to quickly understand and integrate into new organizational environments and infrastructure.
· Problem-Solving Mindset: Strong skills in identifying and resolving technical issues, and in developing innovative solutions to meet project goals.
· Communication Skills: Excellent communication skills, with experience engaging both technical and non-technical stakeholders to ensure alignment and understanding.
· Attention to Detail: Ability to create comprehensive and precise documentation and training materials that ensure smooth knowledge transfer and long-term customer success.
Travel Requirements:
This role requires the ability and willingness to travel up to 50% of the time to collaborate with clients on-site during PoC testing, system deployment, and production rollouts.:
In this role, you will have the opportunity to:
· Lead the deployment of AI solutions, playing a key role in accelerating AI innovation and driving AI adoption across a wide range of industries.
· Build a diverse and impressive portfolio of successful AI deployments in enterprise environments.
· Shape the future of enterprise AI adoption through your contributions to hands-on implementations and strategic guidance for customers and internal teams.