Назад
Company hidden
6 дней назад

Manager, HPC Storage Engineer (AI)

150 000 - 240 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Manager, HPC Storage Engineer (AI): Building and operating global distributed storage platforms for AI training and inference with an accent on high-performance shared filesystems and low-latency data paths. Focus on designing SAN/NFS architectures, optimizing NVMe/RDMA performance, and scaling storage infrastructure for GPU clusters.

Location: Remote, USA

Salary: $150,000 - $240,000 USD

Company

hirify.global is pioneering the future of AI and machine learning, offering cutting-edge cloud infrastructure for full‑stack AI applications.

What you will do

  • Define and evolve the global distributed storage architecture supporting training, inference, and dataset access at scale.
  • Manage and grow a team of storage and systems engineers, setting clear technical direction and operational standards.
  • Design and operate large-scale SAN and NFS deployments, specifically leveraging VAST Data and parallel filesystems like Lustre.
  • Drive end-to-end performance optimization from NAND/NVMe media through controllers, networking, and client access patterns.
  • Evaluate and deploy cutting-edge capabilities such as NFS over RDMA and GPU Direct Storage (GDS).
  • Partner with Datacenter Networking, GPU Platform, and SRE teams to ensure storage systems meet AI workload requirements.

Requirements

  • 3+ years of experience managing storage, systems, or infrastructure engineering teams in production.
  • 8+ years of experience designing and operating multi-petabyte scale distributed storage systems (SAN/NFS).
  • Hands-on experience deploying and operating VAST Data in production environments is required.
  • Experience with parallel filesystems such as Lustre, GPFS, or BeeGFS.
  • Deep understanding of NAND, NVMe, PCIe, and Linux internals (I/O scheduling, memory management, and performance tuning).
  • Must be based in the USA.

Nice to have

  • Experience supporting AI training pipelines, large-scale model checkpointing, and dataset streaming.
  • Familiarity with RDMA fabrics and collaboration with datacenter networking teams.
  • Experience designing storage for multi-tenant isolation and secure data access.
  • Background in hyperscale, HPC, or AI-focused infrastructure environments.

Culture & Benefits

  • Meaningful equity and stock options for all employees.
  • 100% coverage for medical, dental, and vision plans.
  • Flexible PTO to ensure work-life balance.
  • Remote-first culture with a collaborative team environment utilizing Slack.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →