Назад
Company hidden
19 часов назад

HPC Storage Engineer

Тип работы
fulltime
Грейд
middle/senior
Английский
c1
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

HPC Storage Engineer (Parallel Filesystems): Designing, operating, and optimizing high-throughput storage platforms for demanding compute workloads with an accent on parallel filesystems like Lustre, GPFS, and BeeGFS. Focus on eliminating I/O bottlenecks, tuning metadata performance, and ensuring operational resilience for AI/ML and scientific workloads.

Company

hirify.global specializes in high-performance computing infrastructure and operational excellence.

What you will do

  • Deploy and manage high-performance parallel filesystem services including Lustre, IBM Spectrum Scale (GPFS), and BeeGFS.
  • Optimize I/O throughput, latency, and metadata performance through systematic tuning and data-path optimization.
  • Design and evolve filesystem architectures, including MDS/MDT, OSS/OST, targets, pools, and tiering.
  • Develop automation for repeatable builds, patching, and health checks using Ansible, Bash, and Python.
  • Perform deep performance engineering using tools like IOR, mdtest, and fio to validate system changes.
  • Collaborate with HPC, Linux, and networking teams to align storage behavior with real-world workload patterns.

Requirements

  • 3–7+ years of hands-on experience operating HPC storage platforms and parallel filesystems in production.
  • Proven expertise in installation and day-2 operations of Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
  • Strong Linux systems administration skills (RHEL, Rocky, Ubuntu), including performance tuning and troubleshooting.
  • Deep understanding of RDMA, InfiniBand, NVMe/SAS, and RAID fundamentals.
  • Experience with storage observability, capacity planning, and systematic root-cause analysis.
  • Fluent English (written and spoken) for cross-team collaboration and operational documentation.

Nice to have

  • Experience with HPC schedulers like Slurm and their impact on storage throughput.
  • Familiarity with object storage (S3) or HSM tiering concepts.
  • Exposure to HPC container technologies such as Apptainer, Singularity, or Docker.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →