Cerebras Systems

Senior AI Infrastructure Tools Engineer

Sunnyvale, CA, US

Remote
Full-time
9 days ago
Save Job

Summary

Responsibilities * The AI Infrastructure Tools Engineer will be responsible for driving tools effort required to provide high uptime and service availability of AI Infrastructure at Cerebras. * Architect, design, and develop framework and tools for monitoring, operations, and maintenance of AI Infrastructure. * Collaborate with Cluster Deployment, Network Operations, and Cluster Operations teams to understand their needs and ensure tools meet their requirements. * Identify areas for improvement and implement new tools or technologies to enhance AI infrastructure efficiency, reliability, and security. * Use AI to analyze data and identify trends, patterns, and anomalies. * Develop User Interface (UI), reporting, analytics, visualizations, and dashboards consumable by engineers, leadership, and customers. Skills And Qualifications * Bachelor's degree or higher in Electrical Engineering, Computer Engineering or Computer Science. * 10+ years of experience designing and building infrastructure tools. * Must be proficient in python development, and or Golang. * Expertise with cloud platforms like AWS, Azure, or Google Cloud. * Manage tooling infrastructure using code, enabling repeatable and consistent deployments. * Experience with containerization (e.g., Docker) and orchestration (e.g., Kubernetes). * Expertise in setting up and maintain monitoring and logging systems * Experience with network monitoring and analytics tools like Prometheus, Grafana; and familiarity with GNMI, OpenConfig, OpenTelemetry, or New Relic. * Experience building dashboards and analytics. Understanding of UX/UI design techniques. * Excellent problem-solving and analytical skills. * Strong communication and collaboration skills.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job