Research Scientist, RL Training (AI)

200 000 - 275 000$

Формат работы

remote (только USA)/hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Research Scientist, RL Training (AI): Developing reinforcement learning techniques and data pipelines to align large language models with an accent on reward modeling and preference datasets. Focus on implementing RLHF, DPO, and GRPO to create high-quality training signals for frontier AI labs.

Location: Hybrid in Redwood City or San Francisco, CA, or Remote within the United States

Salary: $200,000 - $275,000 USD

Company

hirify.global helps enterprises transform expert knowledge into specialized AI at scale through a data-centric approach.

What you will do

Research and implement RL techniques including GRPO, RLHF, RLAIF, and DPO to create data products for LLM fine-tuning.
Design and build data pipelines for AI-assisted annotation and curation to improve model generalization.
Prototype end-to-end RL training recipes that inform the data-as-a-service deliveries.
Collaborate with research, engineering, and delivery teams to translate RL research into customer-ready products.
Stay current with large-scale multi-node LLM training, alignment research, and scalable RL methods.
Contribute to internal knowledge bases and research publications in RL and model training.

Requirements

Deep expertise in RL from human or AI feedback, reward modeling, and credit attribution.
Experience training or fine-tuning 30B+ large language models at scale using distributed training infrastructure.
Proficiency in Python, PyTorch, HuggingFace, and RL frameworks such as Verl and SkyRL.
Strong software engineering fundamentals for building reproducible research prototypes.
Familiarity with cloud platforms and ML infrastructure (AWS, GCP, Kubernetes, Slurm).
Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred.

Culture & Benefits

Environment combining the stability of proven solutions with the excitement of high growth.
Opportunities to shape strategic priorities and influence key decisions.
Support for career development and learning across multiple technical functions.
Equal Employment Opportunity employer committed to diversity and inclusion.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →