Назад
Company hidden
2 часа назад

Research Scientist - RL Training (AI)

200 000 - 325 000$
Формат работы
remote (только USA)/hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Research Scientist - RL Training (AI): Developing reinforcement learning techniques and data pipelines to steer LLM behavior with an accent on reward modeling and preference datasets. Focus on implementing RLHF, DPO, and GRPO to produce high-quality training corpora for frontier AI labs.

Location: Hybrid in Redwood City or San Francisco, CA, or Remote within the United States

Salary: $200,000 - $325,000 USD

Company

hirify.global helps enterprises transform expert knowledge into specialized AI at scale by focusing on the data used to build AI systems.

What you will do

  • Research and implement RL techniques (GRPO, RLHF, RLAIF, DPO) to create data products like preference datasets and reward signals.
  • Design and build data pipelines for high-quality RL training signals and AI-assisted annotation to improve model generalization.
  • Prototype end-to-end RL training recipes to inform data-as-a-service deliveries.
  • Collaborate with research, engineering, and delivery teams to translate RL research into customer-ready data products.
  • Stay current with multi-node LLM training, alignment research, and scalable RL methods.
  • Contribute to research publications and the internal knowledge base.

Requirements

  • Deep expertise in RLHF, reward modeling, and credit attribution.
  • Experience training or fine-tuning 30B+ large language models at scale using distributed training infrastructure.
  • Proficiency in Python, PyTorch, HuggingFace, and RL frameworks such as Verl and SkyRL.
  • Strong software engineering fundamentals to build extensible research prototypes.
  • Familiarity with AWS, GCP, Kubernetes, or Slurm.
  • Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred.

Culture & Benefits

  • Opportunity to shape priorities and strategic decisions in a rapidly scaling company.
  • Support for deepening technical expertise and exploring leadership opportunities.
  • Combination of stability with robust funding and the excitement of high growth.
  • Inclusive work environment committed to diversity and equal employment opportunities.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →