Manager, HPC Storage Engineer (AI)

150 000 - 240 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

lead

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Manager, HPC Storage Engineer (AI): Building and operating global distributed storage platforms for AI training and inference with an accent on high-performance shared filesystems and low-latency data paths. Focus on designing SAN/NFS architectures, optimizing NVMe/RDMA performance, and scaling storage infrastructure for GPU clusters.

Location: Remote, USA

Salary: $150,000 - $240,000 USD

Company

hirify.global is pioneering the future of AI and machine learning, offering cutting-edge cloud infrastructure for full‑stack AI applications.

What you will do

Define and evolve the global distributed storage architecture supporting training, inference, and dataset access at scale.
Manage and grow a team of storage and systems engineers, setting clear technical direction and operational standards.
Design and operate large-scale SAN and NFS deployments, specifically leveraging VAST Data and parallel filesystems like Lustre.
Drive end-to-end performance optimization from NAND/NVMe media through controllers, networking, and client access patterns.
Evaluate and deploy cutting-edge capabilities such as NFS over RDMA and GPU Direct Storage (GDS).
Partner with Datacenter Networking, GPU Platform, and SRE teams to ensure storage systems meet AI workload requirements.

Requirements

3+ years of experience managing storage, systems, or infrastructure engineering teams in production.
8+ years of experience designing and operating multi-petabyte scale distributed storage systems (SAN/NFS).
Hands-on experience deploying and operating VAST Data in production environments is required.
Experience with parallel filesystems such as Lustre, GPFS, or BeeGFS.
Deep understanding of NAND, NVMe, PCIe, and Linux internals (I/O scheduling, memory management, and performance tuning).
Must be based in the USA.

Nice to have

Experience supporting AI training pipelines, large-scale model checkpointing, and dataset streaming.
Familiarity with RDMA fabrics and collaboration with datacenter networking teams.
Experience designing storage for multi-tenant isolation and secure data access.
Background in hyperscale, HPC, or AI-focused infrastructure environments.

Culture & Benefits

Meaningful equity and stock options for all employees.
100% coverage for medical, dental, and vision plans.
Flexible PTO to ensure work-life balance.
Remote-first culture with a collaborative team environment utilizing Slack.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →