Назад
Company hidden
5 дней назад

Distributed Training Engineer (AI)

Формат работы
remote (только USA)/hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Distributed Training Engineer (AI): Developing and optimizing large-scale distributed LLM training systems for scientific research with an accent on distributed training frameworks and high-throughput GPU cluster performance. Focus on debugging complex training workflows, contributing to open-source frameworks, and supporting frontier-scale experiments in a high-impact lab environment.

Location: Based in Menlo Park, California, or remote within the United States.

Company

An AI and physical sciences lab building state-of-the-art models to accelerate novel scientific discoveries.

What you will do

  • Optimize, operate, and develop large-scale distributed LLM training systems.
  • Collaborate with researchers to bring up, debug, and maintain training and reinforcement learning workflows.
  • Build tools to support frontier-scale experiments in physics and materials science.
  • Contribute to open-source large-scale LLM training frameworks.
  • Maintain system performance for massive-scale model development.

Requirements

  • Experience training models on clusters with 5,000 or more GPUs.
  • Proficiency with 5D parallel LLM training.
  • Expertise in distributed training frameworks like Megatron-LM, FSDP, DeepSpeed, or TorchTitan.
  • Ability to optimize training throughput for large-scale Mixture-of-Expert models.
  • Must be based in the United States.

Culture & Benefits

  • Work in a well-funded, rapidly growing lab environment.
  • Ownership-based culture with minimal bureaucracy.
  • Opportunities to learn new tools at the intersection of AI and physical sciences.
  • Direct contribution to groundbreaking scientific research.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →