3 дня назад
AI Researcher (Post Training)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
AI Researcher (Post Training): Developing and optimizing the post-training pipeline for a natural-language software creation platform with an accent on RFT/RLVR, preference optimization, and code generation. Focus on translating cutting-edge research into production-ready training recipes and building scalable evaluation systems.
Location: Stockholm, Sweden
Company
enables millions of users to transform raw ideas into real software products using plain language.
What you will do
- Own the full post-training lifecycle, from data curation and training runs through evaluation and deployment.
- Adapt reinforcement learning, preference optimization, and SFT to improve code generation, reasoning, and agentic reliability.
- Build evaluation and experimentation infrastructure to measure helpfulness, safety, latency, and reliability.
- Develop and operate production systems for large-scale training, including GPU orchestration and data pipelines.
- Collaborate with agent, product, and infrastructure engineers to translate model gains into user-facing improvements.
- Investigate and resolve end-to-end failures in training recipes, data, or serving regressions.
Requirements
- Hands-on experience running post-training jobs (RFT/RLVR, preference optimization) on LLMs.
- Ability to write reliable production-grade code.
- Proficiency in PyTorch or JAX and experience with distributed training and GPU clusters.
- Strong understanding of the mathematics behind reward modeling, alignment, and preference optimization.
- Experience building evaluation systems that capture real-world quality rather than just benchmarks.
- English: Required (company language)
Nice to have
- Experience with code generation or agentic use cases.
- History of owning the full loop from data curation to production monitoring.
- Ability to rapidly prototype research papers into running code.
- Experience with speculative decoding or other model efficiency techniques.
- Contributions to the open-source ML ecosystem or research publications.
Culture & Benefits
- Talent-dense team with a culture of extreme ownership and high velocity.
- Low-ego collaboration environment.
- Fast-paced atmosphere focused on shipping impact to users quickly.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Похожие вакансии
21 час назад
Principal Machine Learning Engineer (AI)
21 час назад
VP of Research, Machine Learning (AI)
21 час назад
Technical Lead, Machine Learning (AI)
48 минут назад
Research Engineer (AI)
500 000 - 850 000$
21 час назад
Member Of Technical Staff, Machine Learning (AI)
21 час назад