GenBio AI

Lead Data Engineer

Palo Alto, CA, US

25 days ago

Save Job

Summary

Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.

We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.

Key Responsibilities:

Lead the strategic design of a holistic solution to our large and diverse data usage needs,
Set the collaboration and reusability strategy for data consumption including publicly available and partner generated data
Ensure the FAIR principles are followed in our data storage and retrieval strategy
Build and maintain scalable, efficient, and reusable data products and codebases for large-scale foundation model training, adaptation, evaluation, and inference
Collaborate closely with data engineers and research scientists to integrate models into production environments
Ensure code quality, scalability, and performance through rigorous testing and code reviews

Qualifications:

Bachelor’s, Master’s degree in Computer Science, Engineering, or related field. Experience in life sciences or healthcare is required
Strong familiarity with at least some (the more the better) of the following biomedical data types: Sequencing data, other high throughput omics data, biological imaging data, clinical and phenotypic data
Experience with using (developing an advantage) large scale data products and systems for biological or biomedical applications
Stong programming skills in JavaScript, Python, and modern web development frameworks, and familiarity with GPU-accelerated tools (e.g., CUDA, cuDNN, Triton)
Knowledge of major deep learning frameworks such as PyTorch, HuggingFace Transformers & Accelerate, or Megatron-LM/DeepSpeed
Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes)
Proficiency in back-end frameworks like Django, Flask, or Node.js, and database technologies (e.g., PostgreSQL, MongoDB)
Expertise in distributed systems, cloud computing (AWS, GCP), and containerization tools (Docker, Kubernetes)

Preferred Qualifications:

Prior experience pre-training or serving large language models or large-scale foundation models
Experience with deep learning workflows
Knowledge of challenges and experience with bioinformatics tools
Familiarity with version control systems like Git and CI/CD pipelines
Strong understanding of RESTful APIs, authentication, and deployment pipelines
Familiarity with machine learning workflows and biological datasets

Join us as we embark on this journey to redefine the future of biology and medicine.

We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.

GenBio AI

Lead Data Engineer

Palo Alto, CA, US

Summary

How strong is your resume?

How strong is your resume?

MORE JOBS LIKE THIS

People also searched:

Our Company

Career Guides

Career Advice

Support