EVONA

DevOps Engineer

Singapore

19 days ago
Save Job

Summary

DevOps Engineer (AI/ML)

Location: Remote

About Us

We are a fast-growing AI technology company at the forefront of transforming the ecommerce and fashion industries. Our mission is to revolutionize these spaces by developing innovative AI-driven solutions that make a tangible impact. We thrive in a dynamic, fast-paced environment where creativity meets cutting-edge technology. If you're passionate about DevOps, cloud infrastructure, and AI/ML, this is the perfect opportunity for you to join an industry-leading team and drive real change.

Role Overview

We are looking for an experienced DevOps Engineer to play a key role in managing and scaling our AI/ML infrastructure. The ideal candidate will bring a blend of software development, IT operations, and cloud infrastructure expertise, with a focus on automation, security, and continuous improvement. You will work closely with engineering and design teams to optimize GPU workflows, enhance application performance, and streamline deployments.

Key Responsibilities

  • Manage and optimize GPU-based ML/AI workflows and infrastructure.
  • Design, implement, and maintain scalable tools for secure and efficient software development life cycles (SDLC).
  • Collaborate with development teams to troubleshoot issues, improve performance, and address system reliability.
  • Build and maintain CI/CD pipelines, focusing on automation and infrastructure-as-code to ensure seamless deployments.
  • Monitor, troubleshoot, and optimize cloud infrastructure to ensure high performance and scalability.
  • Stay current with emerging DevOps, cloud, and security trends, and apply them to improve processes.
  • Ensure compliance with industry standards and best practices for security, scalability, and reliability.
  • Foster a culture of technical excellence and continuous learning across the team.

Qualifications

  • Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
  • Proven experience as a DevOps Engineer or Site Reliability Engineer (SRE) with a strong portfolio of relevant projects.
  • Deep expertise in AWS or Azure, with hands-on experience deploying and managing applications in a cloud environment.
  • Strong experience with GPU infrastructure, NVIDIA tools, and managing AI/ML workflows.
  • Proficiency in Kubernetes, Docker, and CI/CD tools for managing containerized applications.
  • Experience with Lustre file systems and optimizing GPU workflows in Kubernetes is highly preferred.
  • Solid understanding of SQL/NoSQL databases, web servers, and UI/UX design principles.
  • Strong problem-solving skills and attention to detail, with the ability to thrive in a fast-paced, deadline-driven environment.
  • Excellent communication and collaboration skills, with a team-oriented mindset.
  • Must provide a link to a GitHub portfolio or other relevant projects.

Benefits

  • Competitive salary with performance-based incentives.
  • Comprehensive health and wellness benefits.
  • Opportunities for professional growth and continuous learning.
  • Fully remote work environment.
  • Join an innovative company that is reshaping the future of fashion and AI.
  • Work alongside industry leaders and influencers at the cutting edge of technology.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job

People also searched: