Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Research Engineer (RL Scaling Science): Designing and running large-scale RL experiments to develop training recipes for frontier models with an accent on scaling laws, compute efficiency, and task horizons. Focus on building benchmarks for long-horizon RL and resolving complex bottlenecks at the intersection of research and infrastructure.
Location: Hybrid: Must be based in London, UK (minimum 25% office presence)
Salary: £375,000 - £640,000 GBP
Company
Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.
What you will do
- Design, run, and interpret large-scale RL experiments to understand scaling behavior across model size and compute.
- Build and maintain benchmarks for long-horizon RL to ensure progress is measurable and reproducible.
- Translate validated research findings into production training recipes for frontier models.
- Debug complex failures occurring at the seam where research meets large-scale infrastructure.
- Collaborate with adjacent RL teams to advance the overall RL stack and capabilities.
Requirements
- Strong empirical research skills in Reinforcement Learning or large-scale ML training.
- Ability to own large experiments end-to-end, from design through interpretation.
- Proficiency in Python and experience with distributed ML systems.
- Comfort operating and debugging at the research/systems boundary.
- Bachelor's degree or equivalent combination of education and professional experience.
Nice to have
- Published or shipped work in long-horizon RL or RL fundamentals.
- Experience translating research findings into production training recipes.
- Demonstrated large-scale industry impact via RL interventions.
- Experience working on frontier-scale training runs with long trajectories.
Culture & Benefits
- Collaborative "big science" environment focusing on high-impact research over small puzzles.
- Competitive compensation with optional equity donation matching.
- Generous vacation and parental leave policies.
- Flexible working hours and high-quality collaborative office space.
- Visa sponsorship available for eligible candidates.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →