Назад
2 дня назад

Research Engineer, Code RL (AI)

500 000 - 850 000$
Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Engineer, Code RL (AI/RL): Advancing models' ability to write, edit, test, and debug software using reinforcement learning with an accent on designing RL environments and reward signals. Focus on building scalable RL infrastructure, enhancing model reasoning, and implementing long-horizon autonomous engineering.

Location: Hybrid (San Francisco, CA or New York City, NY); staff are expected to be in one of the offices at least 25% of the time

Salary: $500,000 - $850,000 USD

Company

Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.

What you will do

  • Design RL environments and coding tasks to advance models' ability to ship real software end-to-end.
  • Build reward signals and verifiers that define and capture the criteria for high-quality code.
  • Execute training experiments on frontier models and diagnose performance bottlenecks.
  • Improve the speed and reliability of pipelines to enable fast iteration of RL research.
  • Collaborate with alignment and frontier red teams to ensure systems are both capable and safe.

Requirements

  • Strong software engineering skills and deep Python expertise, including async/concurrent programming.
  • Ability to own systems end-to-end and debug across the entire stack.
  • Capacity to balance research exploration with rigorous engineering implementation and experimental design.
  • Deep commitment to code quality, testing, and system performance.
  • Bachelor’s degree or equivalent combination of education and professional experience in a relevant field.

Nice to have

  • Experience with RL, RLHF, post-training, or LLM finetuning.
  • Experience building coding agents, code-execution sandboxes, or developer tooling.
  • Background in program analysis, compilers, verification, or formal methods.
  • Proficiency with PyTorch, large-scale distributed training, and ML system optimization.
  • Experience with CUDA/GPU/TPU kernels and accelerator-performance intuition.

Culture & Benefits

  • Collaborative "big science" research environment focusing on high-impact goals.
  • Competitive compensation with optional equity donation matching.
  • Generous vacation and parental leave policies.
  • Flexible working hours and modern office spaces for collaboration.
  • Visa sponsorship is available for qualified candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →