HTX (Home Team Science & Technology Agency)

Lead Engineer - AI Infrastructure, HTxAI

Singapore

9 days ago
Save Job

Summary

HTX is the world’s first Science and Technology agency for Public Safety and Security. As a statutory board of the Ministry of Home Affairs and integral to the Home Team, our shared mission is to amplify, augment and accelerate the Home Team’s advantage in securing Singapore as the safest place on planet earth.


The jobholder will be part of the AI Infrastructure, which is a foundational piece to enable the HTxAI movement. The multi-disciplinary team envisions, designs, implements, tests, delivers, and sustains MHA’s next-generation on-premise AI/ML & Cloud infrastructure. These technologies/solutions span an integrative approach of software, hardware and algorithms to drive computational accelerators across diverse workloads, including AI, machine learning, large-scale search, simulations, and security for AI.


As the Lead Engineer in AI Infrastructure, you will play a pivotal role in design, development and integration of accelerated computing solutions, as continual development of the on-premise enterprise infrastructure, to meet the diverse applications in public safety and security. The Lead Engineer is also expected to manage daily infrastructure operations and constantly innovate to optimise the on-premise infrastructure, with objectives to ensure robust, scalable and secure hosting infrastructure services for Home Team operations.


We are looking for highly technical individuals who are passionate about technology, love to solve problems, must be able to code and have a sound engineering mind. The ideal candidate is one who dares to innovate, takes ownership, a great team player, inspires excellence, is a skilled learner and one who embraces failures.


What you would be working on:

  • Design, build, and sustain the on-premise AI & Cloud infrastructure for MHA.
  • Innovate with principles of Lean infrastructure Operations in managing high-performance computing environments, including GPU and other advanced compute clusters and cloud-based solutions.
  • Ensure the scalability, reliability, and security of digital infrastructure resources to support diverse user software and AI/ML and operational software workloads.
  • Innovate in accelerated computing by exploring novel algorithms and hardware solutions that push the boundaries of current computational capabilities, in the context of Home Team operations.
  • Implement best practices for infrastructure as code (IaC), configuration management, and version control.
  • Design, develop and integrate scalable accelerated computing architectures for complex computational tasks.
  • Ensure compliance with security policies and regulatory requirements.


What are we looking for:

  • Bachelor’s or Master’s degree in Computing / Computer Science / Engineering, or a related discipline.
  • 2-5 years of work experience in infrastructure engineering, with a focus on software development, software engineering and/or AI engineering.
  • Strong knowledge of cloud platforms (e.g., AWS, Google Cloud, Azure) and on-premises infrastructure.
  • Proficiency in software languages such as C++, Terraform, CISCO ACI scripts or similar.
  • Experience with setting up Kubernetes clusters, containerisation and workload scheduling, or similar.
  • Familiarity with database and storage solutions (e.g. PostGres, Redis, MongoDB and other vector databases).
  • Understanding of security best practices and compliance requirements for handling confidential data.
  • Excellent problem-solving skills and the ability to work independently and collaboratively in a fast-paced environment.
  • Strong communication skills and the ability to articulate complex technical concepts to non-technical stakeholders.


Preferred

  • Experience with MLOps and serving AI models, including distributed training and distributed inference.
  • Knowledge / Experience with security solutions for AI/ML.
  • Certifications in cloud platforms or infrastructure engineering.
  • Experience in implementing data processing technologies (e.g. Kafka), big data technologies and distributed systems.


*** All new hires are appointed on a two-year contract in the first instance and will be assessed and considered for permanent tenure over time, based on performance. As part of the shortlisting process for this role, you may be required to complete a medical declaration and/or undergo further assessment. All applicants will be updated on the status of their applications within 4 weeks upon closing of the advertisement.

How strong is your resume?

Upload your resume and get feedback from our expert to help land this job