HPC Storage Engineer

Тип работы

fulltime

Грейд

middle/senior

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

HPC Storage Engineer (Parallel Filesystems): Designing, operating, and optimizing high-throughput storage platforms for demanding compute workloads with an accent on parallel filesystems like Lustre, GPFS, and BeeGFS. Focus on eliminating I/O bottlenecks, tuning metadata performance, and ensuring operational resilience for AI/ML and scientific workloads.

Company

hirify.global specializes in high-performance computing infrastructure and operational excellence.

What you will do

Deploy and manage high-performance parallel filesystem services including Lustre, IBM Spectrum Scale (GPFS), and BeeGFS.
Optimize I/O throughput, latency, and metadata performance through systematic tuning and data-path optimization.
Design and evolve filesystem architectures, including MDS/MDT, OSS/OST, targets, pools, and tiering.
Develop automation for repeatable builds, patching, and health checks using Ansible, Bash, and Python.
Perform deep performance engineering using tools like IOR, mdtest, and fio to validate system changes.
Collaborate with HPC, Linux, and networking teams to align storage behavior with real-world workload patterns.

Requirements

3–7+ years of hands-on experience operating HPC storage platforms and parallel filesystems in production.
Proven expertise in installation and day-2 operations of Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
Strong Linux systems administration skills (RHEL, Rocky, Ubuntu), including performance tuning and troubleshooting.
Deep understanding of RDMA, InfiniBand, NVMe/SAS, and RAID fundamentals.
Experience with storage observability, capacity planning, and systematic root-cause analysis.
Fluent English (written and spoken) for cross-team collaboration and operational documentation.

Nice to have

Experience with HPC schedulers like Slurm and their impact on storage throughput.
Familiarity with object storage (S3) or HSM tiering concepts.
Exposure to HPC container technologies such as Apptainer, Singularity, or Docker.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →