Назад
Company hidden
2 дня назад

Research Engineer (RL Infrastructure)

150 000 - 300 000$
Формат работы
remote (Global)/hybrid/onsite
Тип работы
fulltime
Английский
b2
Страна
US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Engineer (RL Infrastructure): Building and optimizing systems infrastructure for large-scale reinforcement learning and distributed training with an accent on kernel optimization, memory efficiency, and distributed scaling. Focus on pushing training performance toward hardware limits and developing the stack for open superintelligence.

Location: Flexible work arrangements; remote or in-person in San Francisco. Visa sponsorship and relocation support provided for international candidates.

Salary: $150,000–$300,000, plus equity.

Company

hirify.global is building an open superintelligence stack, unifying distributed compute with reinforcement learning post-training systems to enable frontier-scale model adaptation.

What you will do

  • Build and optimize systems infrastructure for large-scale RL and distributed training.
  • Improve efficiency across compute, memory, networking, and scheduling layers.
  • Design and implement low-level optimizations including CUDA/Triton kernels.
  • Work on distributed training systems spanning data, tensor, and pipeline parallelism.
  • Contribute to internal infrastructure and open-source libraries for frontier model training.
  • Collaborate with researchers to translate performance bottlenecks into concrete systems improvements.

Requirements

  • Strong systems engineering experience in AI/ML infrastructure.
  • Deep familiarity with PyTorch and distributed training frameworks like DeepSpeed, FSDP, or Ray.
  • Hands-on experience with large-scale training techniques including data and tensor parallelism.
  • Understanding of GPU architecture, profiling, and performance debugging.
  • Ability to identify and drive bottlenecks improvements from first principles.
  • Comfort working in a fast-moving, high-ownership startup environment.

Nice to have

  • Experience writing or optimizing CUDA or Triton kernels.
  • Experience with compiler or runtime optimization for ML systems.
  • Background in RL training infrastructure or asynchronous training pipelines.
  • Experience with multi-node GPU clusters and high-performance networking.
  • Contributions to open-source ML systems or technical writing.

Culture & Benefits

  • Equity included in compensation package.
  • Support for visa sponsorship and relocation.
  • Quarterly team offsites, hackathons, and conference attendance.
  • High-agency, deeply technical work environment.
  • Flexible work arrangements (remote/in-office).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →