Назад
Company hidden
5 дней назад

Research Scientist (AI/LLM Frontier Models)

Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Scientist (AI/LLM Frontier Models): Building and optimizing post-training frontier models, especially Gemini, with an accent on architecting Reward Modeling and Reinforcement Learning strategies for hard capabilities like chain-of-thought reasoning. Focus on designing novel post-training pipelines, advancing reward models, and solving the "flywheel" challenge for continuous model improvement across multimodal domains.

Location: Zurich, Switzerland (Onsite)

Company

hirify.global is a team of scientists and engineers working to advance state-of-the-art AI, focusing on widespread public benefit, scientific discovery, safety, and ethics.

What you will do

  • Design and validate novel post-training pipelines (SFT, RLHF, RLAIF) specifically for frontier-class models.
  • Lead research into next-gen Reward Models, including investigating new architectures and improving signal-to-noise ratios.
  • Develop innovative methods to improve the model's internal reasoning (chain-of-thought), focusing on correctness, logic, and self-correction.
  • Critically re-evaluate and optimize RL prompts and feedback mechanisms to extract maximum performance from base models.
  • Create robust mechanisms to turn user signals and interactions into training data for continuous model improvement.
  • Collaborate across teams to apply advanced recipes to various model sizes and modalities (e.g., Audio).

Requirements

  • PhD in machine learning, artificial intelligence, or computer science (or equivalent practical experience).
  • Strong background in Large Language Models (LLMs), Reinforcement Learning (RL), or preference learning.
  • Research interest in aligning AI systems with human feedback and utility.
  • Familiarity with experiment design and analyzing large-scale user data.
  • Strong coding and communication skills.

Nice to have

  • Experience with RLHF (Reinforcement Learning from Human Feedback) or DPO (Direct Preference Optimization).
  • Experience building or improving reward models and conducting human evaluation studies.
  • A proven track record of publications in top-tier conferences (e.g., NeurIPS, ICML, ICLR).
  • Experience with Chain-of-Thought (CoT) reasoning research or process-based supervision.
  • Deep understanding and experience training models from scratch or using self-play/self-improvement techniques.

Culture & Benefits

  • Fosters an environment where ambitious, long-term research flourishes.
  • Committed to diversity of experience, knowledge, backgrounds, and perspectives.
  • Ensures safety and ethics are the highest priority in AI development.
  • Provides equal employment opportunity regardless of protected characteristics.
  • Offers accommodation for disabilities or additional needs.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →