Company Overview
Dyna Robotics is at the forefront of revolutionizing robotic manipulation with cutting-edge foundation models. Our mission is to empower businesses by automating repetitive, stationary tasks with affordable, intelligent robotic arms. Leveraging the latest advancements in foundation models, we're driving the future of general-purpose robotics—one manipulation skill at a time.
Dyna Robotics was founded by industry leaders who previously achieved a $350 million exit in grocery deep tech as well as top robotics researchers from DeepMind and Nvidia. Our team blends world-class research, engineering, and product innovation to drive the future of robotic manipulation. With $20mil+ in funding, we're positioned to redefine the landscape of robotic automation. Join us to shape the next frontier of AI-driven robotics.
Position Overview
We are seeking a
Staff Infrastructure Engineer to lead efforts in
designing and optimizing distributed storage systems and caching layers that power our large-scale training and data processing pipelines. This role is critical for ensuring high-throughput, low-latency access to vast datasets across a growing fleet of cloud and on-prem GPUs.
You will focus on building scalable, fault-tolerant storage solutions and intelligent caching strategies to accelerate model iteration and enable real-time data streaming across the ML training stack. While ML experience is not required, you should be passionate about solving complex data movement and storage challenges in high-performance computing environments.
Key Responsibilities
- Architect and maintain high-throughput, scalable distributed file systems (e.g., Lustre, Alluxio, CephFS, or similar).
- Optimize read/write I/O performance for large-scale ML and robotic sensor datasets.
- Design systems that ensure high availability, data integrity, and low-latency access across nodes and regions.
- Develop intelligent caching systems to reduce latency and cloud storage costs (e.g., tiered caching across RAM, NVMe, object stores).
- Implement prefetching and eviction strategies based on workload patterns.
- Work with researchers to identify data bottlenecks and optimize throughput for common access patterns.
- Lead the design and deployment of data infrastructure that scales to petabytes of logs, video, and training data.
- Evaluate tradeoffs across object storage (e.g., S3, GCS), network-attached storage, and local disk solutions.
- Build robust monitoring and alerting systems for storage health, throughput, and latency.
- Design for failure recovery, redundancy, and data consistency in distributed environments.
- Partner with ML engineers, data engineers, and platform teams to align infrastructure with evolving training and data needs.
- Serve as a domain expert in storage, caching, and I/O optimization across the engineering organization.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field.
- 7+ years of experience building and maintaining infrastructure systems, with 3+ years focused on storage or distributed systems.
- Deep experience with distributed filesystems (e.g., Alluxio, Lustre, CephFS, HDFS) or caching layers (e.g., Memcached, Redis, custom).
- Strong understanding of data locality, throughput optimization, and system bottlenecks in high-performance computing environments.
- Hands-on experience with cloud storage systems (e.g., S3, GCS) and data access performance tuning.
- Solid systems programming skills in C++, Go, or Rust.
- Familiarity with job scheduling, Kubernetes, or HPC cluster management is a plus.
- Clear communication skills and the ability to mentor junior engineers or collaborate across teams.
Preferred Qualifications
- Prior experience working on large-scale data platforms, ML infrastructure, or robotics systems.
- Contributions to open-source storage or caching systems.
- Experience with infrastructure-as-code and container orchestration frameworks.
Benefits
- Competitive salary and equity in a seed-stage venture-backed startup
- Comprehensive health, dental, and vision insurance
- Flexible work arrangements
- Daily catered lunches and dinners with a fully stocked kitchen
- Professional growth and development through training, mentorship, and challenging projects
If you’re passionate about building data infrastructure that powers the next generation of intelligent robots, we’d love to hear from you.