Research Scientist (AI/LLM Frontier Models)

Формат работы

onsite

Тип работы

fulltime

Английский

Страна

Switzerland

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Research Scientist (AI/LLM Frontier Models): Building and optimizing post-training frontier models, especially Gemini, with an accent on architecting Reward Modeling and Reinforcement Learning strategies for hard capabilities like chain-of-thought reasoning. Focus on designing novel post-training pipelines, advancing reward models, and solving the "flywheel" challenge for continuous model improvement across multimodal domains.

Location: Zurich, Switzerland (Onsite)

Company

hirify.global is a team of scientists and engineers working to advance state-of-the-art AI, focusing on widespread public benefit, scientific discovery, safety, and ethics.

What you will do

Design and validate novel post-training pipelines (SFT, RLHF, RLAIF) specifically for frontier-class models.
Lead research into next-gen Reward Models, including investigating new architectures and improving signal-to-noise ratios.
Develop innovative methods to improve the model's internal reasoning (chain-of-thought), focusing on correctness, logic, and self-correction.
Critically re-evaluate and optimize RL prompts and feedback mechanisms to extract maximum performance from base models.
Create robust mechanisms to turn user signals and interactions into training data for continuous model improvement.
Collaborate across teams to apply advanced recipes to various model sizes and modalities (e.g., Audio).

Requirements

PhD in machine learning, artificial intelligence, or computer science (or equivalent practical experience).
Strong background in Large Language Models (LLMs), Reinforcement Learning (RL), or preference learning.
Research interest in aligning AI systems with human feedback and utility.
Familiarity with experiment design and analyzing large-scale user data.
Strong coding and communication skills.

Nice to have

Experience with RLHF (Reinforcement Learning from Human Feedback) or DPO (Direct Preference Optimization).
Experience building or improving reward models and conducting human evaluation studies.
A proven track record of publications in top-tier conferences (e.g., NeurIPS, ICML, ICLR).
Experience with Chain-of-Thought (CoT) reasoning research or process-based supervision.
Deep understanding and experience training models from scratch or using self-play/self-improvement techniques.

Culture & Benefits

Fosters an environment where ambitious, long-term research flourishes.
Committed to diversity of experience, knowledge, backgrounds, and perspectives.
Ensures safety and ethics are the highest priority in AI development.
Provides equal employment opportunity regardless of protected characteristics.
Offers accommodation for disabilities or additional needs.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Research Scientist (AI/LLM Frontier Models)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Staff/Principal AI Engineer (AI)

AI Research Engineer (AI)

Senior AI Architect (AI)

AI Specialist Legal & Compliance (AI)