Super Micro Computer, Inc.

Staff Architect, AI Infrastructure

San Jose, CA, US

Hybrid
Full-time
$168k–$184k/year
6 days ago
Save Job

Summary

Job Req ID: 26676 About Supermicro: Supermicro is a Top Tier provider of advanced server, storage, and networking solutions for Data Center, Cloud Computing, Enterprise IT, Hadoop/ Big Data, Hyperscale, HPC and IoT/Embedded customers worldwide. We are the #5 fastest growing company among the Silicon Valley Top 50 technology firms. Our unprecedented global expansion has provided us with the opportunity to offer a large number of new positions to the technology community. We seek talented, passionate, and committed engineers, technologists, and business leaders to join us. Job Summary: Supermicro IT team is seeking a visionary Staff Architect, AI Infrastructure to lead the architecture and scaling of GPU-accelerated infrastructure optimized for AI and machine learning workloads. This role requires deep system-level expertise, automation, and hands-on experience designing infrastructure at scale. You will architect integrated compute, network, and cooling systems that support next-generation AI platforms while ensuring operational efficiency and future readiness. Essential Duties and Responsibilities: * Hyperscaler-Grade Infrastructure Design Design and scale high-performance infrastructure inspired by hyperscalers (e.g., NVIDIA DGX SuperPOD, Meta RSC, Azure NDv5, AWS Trainium clusters), with a focus on modularity, density, and operability. * System-Level Architecture Lead the integration of compute, networking, storage, and power systems for high-density GPU workloads (NVIDIA, AMD, Intel Gaudi), ensuring system-wide performance optimization. * Automation & Orchestration Build and standardize infrastructure provisioning, deployment, and monitoring via infrastructure-as-code tools (Terraform, Ansible, Python), ensuring repeatability and scale. * AI-Ready Network Design Architect East-West GPU interconnects and North-South data ingress/egress paths using InfiniBand (HDR/NDR) and high-speed Ethernet (100G/400G), with support for VXLAN, BGP, and EVPN. * Liquid & Air Cooling Infrastructure Design and oversee deployment of air- and liquid-cooled racks, PDUs, containment solutions, and backup power systems tailored for thermally intensive AI workloads. * Observability & Monitoring Implement telemetry and health metrics to proactively manage system performance and lifecycle states. * Infrastructure Documentation & Standards Create robust documentation for reference architectures, operational playbooks, and lifecycle workflows to support global deployments. * Cross-Functional Leadership Collaborate with ML platform teams, data scientists, hardware architects, and facility engineers to align infrastructure capabilities with AI platform needs. * Technology & Market Evaluation Analyze and influence roadmap decisions by staying current on industry trends from NVIDIA, AMD, Intel, and cloud hyperscalers. Qualifications: * 10+ years in data center infrastructure or hyperscaler-scale compute environments, ideally with AI or HPC workloads * Bachelor's degree or equivalent experience * Proven success architecting GPU infrastructure using NVIDIA, AMD, or Intel Gaudi platforms * Hands-on experience with large-scale data center deployments, including mechanical/electrical design and containment * Strong automation experience * Deep knowledge of RDMA, InfiniBand, Ethernet,and overlay networks * Experience with bare-metal orchestration for GPU environments * Experience with hyperscaler environments or colocation data centers supporting AI workloads * Experience supporting AI/ML workloads across hybrid cloud environments * Strong business acumen: able to balance performance, cost, and scalability in architecture decisions Salary Range $168,000 - $184,000 The salary offered will depend on several factors, including your location, level, education, training, specific skills, years of experience, and comparison to other employees already in this role. In addition to a comprehensive benefits package, candidates may be eligible for other forms of compensation, such as participation in bonus and equity award programs. EEO Statement Supermicro is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of Supermicro to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status. Job Segment: Cloud, Data Center, Electrical, Architecture, Technology, Engineering

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job