Назад
Company hidden
2 дня назад

HPC Storage Engineer

Формат работы
remote
Тип работы
fulltime
Грейд
middle/senior
Английский
c1
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

HPC Storage Engineer (Lustre/GPFS/BeeGFS): Designing, operating, and optimizing high-throughput parallel filesystems for demanding compute workloads with an accent on I/O performance tuning and operational resilience. Focus on diagnosing complex metadata bottlenecks, automating lifecycle management, and ensuring high availability for AI/ML and science workloads.

Location: Remote

Company

hirify.global provides specialized high-performance computing operations and infrastructure engineering.

What you will do

  • Deliver stable, high-performance parallel filesystem services using Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
  • Optimize throughput, latency, and metadata performance through data-path tuning and evidence-based benchmarking.
  • Design and evolve filesystem architectures, including MDS/MDT, OSS/OST, targets, pools, and tiering.
  • Perform detailed I/O performance analysis using tools such as IOR, mdtest, and fio to validate architectural changes.
  • Automate repeatable builds, patching, and health checks using Ansible and scripting.
  • Collaborate with HPC, networking, and platform teams to align storage behavior with real workload patterns.

Requirements

  • 3–7+ years of experience operating and supporting HPC storage platforms and parallel filesystems in production.
  • Strong Linux administration skills (RHEL/Rocky/Ubuntu), including performance tuning and troubleshooting.
  • Proven expertise in the installation, upgrade, and day-2 operations of Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
  • Deep understanding of RDMA, InfiniBand, NVMe, and RAID fundamentals.
  • Proficiency in automation and scripting using Bash and/or Python.
  • Fluent English (written and spoken) for cross-team collaboration and documentation.

Nice to have

  • Experience with HPC schedulers such as Slurm.
  • Familiarity with object storage (S3) or HSM tiering concepts.
  • Exposure to HPC containers including Apptainer, Singularity, or Docker.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →