Staff Storage Software Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Storage Software Engineer (AI/Distributed Systems): Designing and deploying high-performance storage protocol solutions at scale to power AI cloud infrastructure with an accent on performance, scalability, and reliability. Focus on implementing storage protocol APIs, optimizing data paths for AI workloads, and providing technical leadership for petabyte-scale deployments.
Location: Hybrid: Must be based in or be able to commute to the San Francisco, San Jose, or Bellevue offices (4 days per week)
Salary: $314,000 – $465,000
Company
is a leader in AI cloud infrastructure providing high-performance compute, networking, and storage for AI researchers and enterprises.
What you will do
- Set technical direction for storage software architecture and influence decisions for petabyte-scale deployments.
- Design and optimize storage protocol APIs across file (NFS, SMB, Lustre), block (NVMe-oF, iSCSI), and object (S3) patterns.
- Develop distributed systems for orchestrating storage resources and integrating with NVMe, GPU-direct storage, and DPUs.
- Mentor senior engineers and provide guidance on systems design and debugging of complex distributed systems.
- Collaborate with Kubernetes and observability teams to define and track SLOs/SLIs for storage systems.
- Prototype next-generation storage solutions and optimize I/O for AI training checkpoints and inference pipelines.
Requirements
- 10+ years of experience in storage systems engineering, with at least 5 years in a Staff+ or technical lead IC role.
- Strong proficiency in low-level systems programming: C, C++, Rust, or Go.
- Deep hands-on experience implementing storage protocol servers or clients (S3, iSCSI, NVMe-oF, NFS, etc.).
- Proven track record of designing and operating multi-petabyte storage infrastructure in production data centers.
- Expertise in storage performance profiling using tools like fio, blktrace, perf, or eBPF.
- Must be based in the USA and able to work from the SF, San Jose, or Bellevue office 4 days per week.
Nice to have
- Experience with NVIDIA BlueField DPUs, SuperNICs, or GPUDirect Storage implementation.
- Production experience with Vast Data, Weka, NetApp, or IBM Spectrum Scale.
- Experience operating Ceph at 100PB+ scale in HPC or AI environments.
- Contributions to open-source storage projects such as Ceph, DAOS, Lustre, or MinIO.
Culture & Benefits
- Generous cash and equity compensation.
- Comprehensive health, dental, and vision coverage for employees and dependents.
- 401k Plan with 2% company match for USA employees.
- Flexible paid time off plan.
- Wellness and commuter stipends for select roles.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →