HPC Storage Engineer

Формат работы

remote

Тип работы

fulltime

Грейд

middle/senior

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

HPC Storage Engineer (Lustre/GPFS/BeeGFS): Designing, operating, and optimizing high-throughput parallel filesystems for demanding compute workloads with an accent on I/O performance tuning and operational resilience. Focus on diagnosing complex metadata bottlenecks, automating lifecycle management, and ensuring high availability for AI/ML and science workloads.

Location: Remote

Company

hirify.global provides specialized high-performance computing operations and infrastructure engineering.

What you will do

Deliver stable, high-performance parallel filesystem services using Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
Optimize throughput, latency, and metadata performance through data-path tuning and evidence-based benchmarking.
Design and evolve filesystem architectures, including MDS/MDT, OSS/OST, targets, pools, and tiering.
Perform detailed I/O performance analysis using tools such as IOR, mdtest, and fio to validate architectural changes.
Automate repeatable builds, patching, and health checks using Ansible and scripting.
Collaborate with HPC, networking, and platform teams to align storage behavior with real workload patterns.

Requirements

3–7+ years of experience operating and supporting HPC storage platforms and parallel filesystems in production.
Strong Linux administration skills (RHEL/Rocky/Ubuntu), including performance tuning and troubleshooting.
Proven expertise in the installation, upgrade, and day-2 operations of Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
Deep understanding of RDMA, InfiniBand, NVMe, and RAID fundamentals.
Proficiency in automation and scripting using Bash and/or Python.
Fluent English (written and spoken) for cross-team collaboration and documentation.

Nice to have

Experience with HPC schedulers such as Slurm.
Familiarity with object storage (S3) or HSM tiering concepts.
Exposure to HPC containers including Apptainer, Singularity, or Docker.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →