Назад
Company hidden
3 дня назад

AI Researcher (Post Training)

Тип работы
fulltime
Английский
b2
Страна
Sweden
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Researcher (Post Training): Developing and optimizing the post-training pipeline for a natural-language software creation platform with an accent on RFT/RLVR, preference optimization, and code generation. Focus on translating cutting-edge research into production-ready training recipes and building scalable evaluation systems.

Location: Stockholm, Sweden

Company

hirify.global enables millions of users to transform raw ideas into real software products using plain language.

What you will do

  • Own the full post-training lifecycle, from data curation and training runs through evaluation and deployment.
  • Adapt reinforcement learning, preference optimization, and SFT to improve code generation, reasoning, and agentic reliability.
  • Build evaluation and experimentation infrastructure to measure helpfulness, safety, latency, and reliability.
  • Develop and operate production systems for large-scale training, including GPU orchestration and data pipelines.
  • Collaborate with agent, product, and infrastructure engineers to translate model gains into user-facing improvements.
  • Investigate and resolve end-to-end failures in training recipes, data, or serving regressions.

Requirements

  • Hands-on experience running post-training jobs (RFT/RLVR, preference optimization) on LLMs.
  • Ability to write reliable production-grade code.
  • Proficiency in PyTorch or JAX and experience with distributed training and GPU clusters.
  • Strong understanding of the mathematics behind reward modeling, alignment, and preference optimization.
  • Experience building evaluation systems that capture real-world quality rather than just benchmarks.
  • English: Required (company language)

Nice to have

  • Experience with code generation or agentic use cases.
  • History of owning the full loop from data curation to production monitoring.
  • Ability to rapidly prototype research papers into running code.
  • Experience with speculative decoding or other model efficiency techniques.
  • Contributions to the open-source ML ecosystem or research publications.

Culture & Benefits

  • Talent-dense team with a culture of extreme ownership and high velocity.
  • Low-ego collaboration environment.
  • Fast-paced atmosphere focused on shipping impact to users quickly.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →