Your Role
- Design and prototype vision‑language architectures that fuse video streams with text for “sense‑plan‑act” use‑cases.
- Build end‑to‑end training pipelines on a small in‑house H200 GPU cluster; handle distributed training, evaluation and iteration.
- Curate, augment and manage large‑scale video‑text datasets—including synthetic caption generation—driving continuous model improvement.
- Create automated testing and benchmarking suites (video QA, captioning, instruction‑following) and turn findings into model refinements.
- Collaborate with a lean team to set technical direction and transition research code into deployable services.
- Work with the robotics team on the A part of the VLA
Preferred To Have
- 5‑7 yrs deep‑learning work in either
- video computer vision (e.g., action recognition, temporal transformers, video classification) or
- large‑scale NLP / LLM fine‑tuning.
- Proven large‑model training on multi‑GPU clusters (PyTorch, distributed, mixed precision).
- Strong Python engineering, data pipelines, and experiment hygiene.
- Comfortable wrangling big video datasets (IO, sampling, augmentation) or large text corpora.
- Effective communicator in small teams.
Nice‑to‑haves
- Direct experience with video‑text or vision‑language models (e.g., Video‑LLM, Flamingo, Idefics).
- Publications or open‑source in multimodal learning.
- Cross‑modal fusion tricks (Q‑Former, Perceiver Resampler).
- Synthetic captioning, retrieval augmentation, RLHF.
- Sense‑plan‑act / embodied AI projects.
- Inference optimization (TensorRT, quantization).
What We Offer
A competitive salary & stock options
The option to work in a brand new office situated on the AI Campus in Berlin where we closely cooperate with other exciting AI ventures
Be on the forefront in defining what artificial intelligence means in manufacturing
Gain hands-on experience in working in an AI-first software company
Supportive and inclusive culture that values diversity and promotes the advancement of underrepresented groups within the company
Collaborate with a diverse (currently more than 10 nationalities) and talented team, working on cutting-edge projects with real-world impact
Network with professionals and leaders in the field, opening doors to potential future career opportunities
We have a very flat hierarchy, open 360° feedback, and flexible working hours
Ethics⚖: We are committed to developing ethical AI software
Don't meet all the requirements?
Deltia is committed to creating a workplace that is diverse, fair, and inclusive. We encourage candidates from all backgrounds, even if they do not meet every qualification, to submit their application. We firmly believe that having a team with diverse perspectives only strengthens our company and drives innovation. Our commitment also extends to providing an accessible environment for everyone, including those with disabilities. Please let us know if you require any accommodations during the application process or while working with us, and we will do our best to support you.