Назад
Company hidden
2 часа назад

HPC Engineer (AI)

350 000 - 450 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

HPC Engineer (AI): Designing and operating high-performance computing infrastructure for AI and scientific research with an accent on GPU/CPU clusters and high-speed interconnects. Focus on optimizing workload scheduling, parallel storage throughput, and scaling compute environments to accelerate scientific discovery.

Location: Preferred in Menlo Park or San Francisco, but flexible based on role

Compensation: $350,000–$450,000

Company

An AI and physical sciences company building state-of-the-art models to accelerate breakthroughs across materials and energy.

What you will do

  • Design, deploy, and operate large-scale GPU and CPU clusters for AI training and scientific simulation.
  • Optimize high-speed interconnect fabrics (InfiniBand, RoCE) and parallel filesystems (Lustre, GPFS, WEKA).
  • Manage workload scheduling and resource allocation using Slurm, Kubernetes, or similar tools.
  • Implement automated cluster provisioning and configuration management via Ansible and Terraform.
  • Partner with research and ML teams to profile workloads and tune hardware/software stacks.
  • Establish standards for HPC operations, capacity planning, and disaster recovery strategies.

Requirements

  • Experience designing and operating large-scale HPC or GPU clusters.
  • Deep knowledge of InfiniBand (HDR/NDR) or RoCE fabric management and troubleshooting.
  • Hands-on experience with parallel storage systems like Lustre, GPFS, or WEKA.
  • Proficiency in Linux systems administration, including kernel tuning and GPU driver management.
  • Experience with GPU computing environments (CUDA, NCCL, MPI) and multi-node distributed training.
  • Bachelor’s degree or equivalent experience.

Nice to have

  • Experience with multi-node transformer training for large-scale AI/ML workloads.
  • Familiarity with AI accelerators such as TPUs or Trainium.
  • Background in computational chemistry, physics simulation, or bioinformatics.
  • Experience with containerized HPC environments (Singularity, Apptainer).
  • Contributions to open-source HPC tooling.

Culture & Benefits

  • Visa sponsorship is provided with full legal support.
  • High-pace environment focused on defining the frontier of AI in the physical world.
  • Opportunity to work with world-class investors and top-tier scientists.
  • Direct impact on breakthroughs in materials and energy science.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →