Назад
Company hidden
5 часов назад

Senior/Staff RL Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Netherlands/Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior/Staff RL Engineer (AI): Building and scaling reinforcement learning training infrastructure for agentic clinical AI assistants with an accent on reward pipelines, distributed training, and solving complex model alignment challenges. Focus on diagnosing training instabilities, implementing novel RL algorithms, and ensuring system reliability for high-stakes clinical environments.

Location: Must be based in the Netherlands or Switzerland, with an expectation of at least 50% time in the office.

Company

hirify.global is a well-funded company building a next-generation agentic clinical AI assistant designed to support clinicians with patient data navigation, diagnostics, and care coordination.

What you will do

  • Own the end-to-end RL training stack, ensuring scalability across large MoE models and long contexts.
  • Develop and maintain reward pipelines, including LLM-based reward models and complex shaping strategies.
  • Debug and resolve training instabilities such as reward hacking, entropy collapse, and credit assignment failures.
  • Explore and implement new RL algorithms, translating research results into production-ready code.
  • Scale training runs across multiple nodes and complex parallelism configurations.
  • Contribute to open-source frameworks by addressing bugs and implementing missing features.

Requirements

  • Must be based in the Netherlands or Switzerland.
  • Deep hands-on experience shipping and scaling RL or post-training systems.
  • Fluency in at least one distributed training framework with the ability to debug source-level failures.
  • Strong understanding of core RL challenges like exploration, sample efficiency, and reward design.
  • Excellent software engineering skills with clean, typed Python code and a focus on reproducibility.
  • Ability to operate independently, taking systems from initial implementation to stable production performance.

Nice to have

  • Experience with verifiable reward signals or LLM-as-judge pipelines.
  • Familiarity with inference serving systems in an RL rollout loop.
  • Experience with MoE training complexity.
  • Exposure to agentic or tool-use RL (web search, code execution).
  • Background in healthcare or regulated-deployment environments.

Culture & Benefits

  • Competitive salary, pension plan, and 25 days of vacation per year.
  • EUR 1000 annual learning and development budget.
  • Regular team offsites and events.
  • Annual commuting subsidy.
  • Flexible work environment focused on ownership and autonomy.

Hiring process

  • Screening call to align on motivation and professional goals.
  • Technical interview with an offline assignment.
  • Onsite meeting to explore collaboration dynamics and team fit.
  • Final executive conversation focused on long-term impact and alignment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →