19 часов назад
HPC Storage Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
HPC Storage Engineer (Parallel Filesystems): Designing, operating, and optimizing high-throughput storage platforms for demanding compute workloads with an accent on parallel filesystems like Lustre, GPFS, and BeeGFS. Focus on eliminating I/O bottlenecks, tuning metadata performance, and ensuring operational resilience for AI/ML and scientific workloads.
Company
specializes in high-performance computing infrastructure and operational excellence.
What you will do
- Deploy and manage high-performance parallel filesystem services including Lustre, IBM Spectrum Scale (GPFS), and BeeGFS.
- Optimize I/O throughput, latency, and metadata performance through systematic tuning and data-path optimization.
- Design and evolve filesystem architectures, including MDS/MDT, OSS/OST, targets, pools, and tiering.
- Develop automation for repeatable builds, patching, and health checks using Ansible, Bash, and Python.
- Perform deep performance engineering using tools like IOR, mdtest, and fio to validate system changes.
- Collaborate with HPC, Linux, and networking teams to align storage behavior with real-world workload patterns.
Requirements
- 3–7+ years of hands-on experience operating HPC storage platforms and parallel filesystems in production.
- Proven expertise in installation and day-2 operations of Lustre, IBM Spectrum Scale (GPFS), or BeeGFS.
- Strong Linux systems administration skills (RHEL, Rocky, Ubuntu), including performance tuning and troubleshooting.
- Deep understanding of RDMA, InfiniBand, NVMe/SAS, and RAID fundamentals.
- Experience with storage observability, capacity planning, and systematic root-cause analysis.
- Fluent English (written and spoken) for cross-team collaboration and operational documentation.
Nice to have
- Experience with HPC schedulers like Slurm and their impact on storage throughput.
- Familiarity with object storage (S3) or HSM tiering concepts.
- Exposure to HPC container technologies such as Apptainer, Singularity, or Docker.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →