Назад
2 дня назад

Research Engineer (RL Scaling Science)

375 000 - 640 000GBP
Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
UK
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Engineer (RL Scaling Science): Designing and running large-scale RL experiments to develop training recipes for frontier models with an accent on scaling laws, compute efficiency, and task horizons. Focus on building benchmarks for long-horizon RL and resolving complex bottlenecks at the intersection of research and infrastructure.

Location: Hybrid: Must be based in London, UK (minimum 25% office presence)

Salary: £375,000 - £640,000 GBP

Company

Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

What you will do

  • Design, run, and interpret large-scale RL experiments to understand scaling behavior across model size and compute.
  • Build and maintain benchmarks for long-horizon RL to ensure progress is measurable and reproducible.
  • Translate validated research findings into production training recipes for frontier models.
  • Debug complex failures occurring at the seam where research meets large-scale infrastructure.
  • Collaborate with adjacent RL teams to advance the overall RL stack and capabilities.

Requirements

  • Strong empirical research skills in Reinforcement Learning or large-scale ML training.
  • Ability to own large experiments end-to-end, from design through interpretation.
  • Proficiency in Python and experience with distributed ML systems.
  • Comfort operating and debugging at the research/systems boundary.
  • Bachelor's degree or equivalent combination of education and professional experience.

Nice to have

  • Published or shipped work in long-horizon RL or RL fundamentals.
  • Experience translating research findings into production training recipes.
  • Demonstrated large-scale industry impact via RL interventions.
  • Experience working on frontier-scale training runs with long trajectories.

Culture & Benefits

  • Collaborative "big science" environment focusing on high-impact research over small puzzles.
  • Competitive compensation with optional equity donation matching.
  • Generous vacation and parental leave policies.
  • Flexible working hours and high-quality collaborative office space.
  • Visa sponsorship available for eligible candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →