HPC Engineer (AI)

350 000 - 450 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

HPC Engineer (AI): Designing and operating high-performance computing infrastructure for AI and scientific research with an accent on GPU/CPU clusters and high-speed interconnects. Focus on optimizing workload scheduling, parallel storage throughput, and scaling compute environments to accelerate scientific discovery.

Location: Preferred in Menlo Park or San Francisco, but flexible based on role

Compensation: $350,000–$450,000

Company

An AI and physical sciences company building state-of-the-art models to accelerate breakthroughs across materials and energy.

What you will do

Design, deploy, and operate large-scale GPU and CPU clusters for AI training and scientific simulation.
Optimize high-speed interconnect fabrics (InfiniBand, RoCE) and parallel filesystems (Lustre, GPFS, WEKA).
Manage workload scheduling and resource allocation using Slurm, Kubernetes, or similar tools.
Implement automated cluster provisioning and configuration management via Ansible and Terraform.
Partner with research and ML teams to profile workloads and tune hardware/software stacks.
Establish standards for HPC operations, capacity planning, and disaster recovery strategies.

Requirements

Experience designing and operating large-scale HPC or GPU clusters.
Deep knowledge of InfiniBand (HDR/NDR) or RoCE fabric management and troubleshooting.
Hands-on experience with parallel storage systems like Lustre, GPFS, or WEKA.
Proficiency in Linux systems administration, including kernel tuning and GPU driver management.
Experience with GPU computing environments (CUDA, NCCL, MPI) and multi-node distributed training.
Bachelor’s degree or equivalent experience.

Nice to have

Experience with multi-node transformer training for large-scale AI/ML workloads.
Familiarity with AI accelerators such as TPUs or Trainium.
Background in computational chemistry, physics simulation, or bioinformatics.
Experience with containerized HPC environments (Singularity, Apptainer).
Contributions to open-source HPC tooling.

Culture & Benefits

Visa sponsorship is provided with full legal support.
High-pace environment focused on defining the frontier of AI in the physical world.
Opportunity to work with world-class investors and top-tier scientists.
Direct impact on breakthroughs in materials and energy science.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →