We seek a hands-on Infrastructure & Systems Team Leader to ensure robust, stable, and efficient data center operations. This role requires technical expertise and leadership to optimize capacity, manage resources, and maintain high availability.
*This role is fully onsite*
Responsibilities:
Ensure 24/7 stability of data centers, servers, storage, and network.
Optimize resources and manage capacity for high-performance AI workloads.
Operate and maintain GPU-based HPC clusters for AI and deep learning workloads.
Manage virtualization environments for efficient workload distribution and scaling.
Implement and enforce security best practices to protect infrastructure and data.
Implement automation and monitoring to enhance efficiency and reliability.
Coordinate with vendors and internal teams for seamless operations.
Qualifications:
10+ years in IT infrastructure, with hands-on experience in data center operations.
Collaborate with U.S. service providers - colocation, networking, cloud, and hardware vendors
Ensure seamless data centers operations and support
Strong knowledge of GPUs, high-performance computing (HPC), and high-speed storage.
Experience with virtualization technologies (VMware, Proxmox, KVM, etc.).
Expertise in securing infrastructure and enforcing security best practices.
Experience with capacity planning and performance optimization.
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job
How strong is your resume?
Upload your resume and get feedback from our expert to help land this job