Research Scientist - RL Training (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Research Scientist - RL Training (AI): Developing reinforcement learning techniques and data pipelines to steer LLM behavior with an accent on reward modeling and preference datasets. Focus on implementing RLHF, DPO, and GRPO to produce high-quality training corpora for frontier AI labs.
Location: Hybrid in Redwood City or San Francisco, CA, or Remote within the United States
Salary: $200,000 - $325,000 USD
Company
helps enterprises transform expert knowledge into specialized AI at scale by focusing on the data used to build AI systems.
What you will do
- Research and implement RL techniques (GRPO, RLHF, RLAIF, DPO) to create data products like preference datasets and reward signals.
- Design and build data pipelines for high-quality RL training signals and AI-assisted annotation to improve model generalization.
- Prototype end-to-end RL training recipes to inform data-as-a-service deliveries.
- Collaborate with research, engineering, and delivery teams to translate RL research into customer-ready data products.
- Stay current with multi-node LLM training, alignment research, and scalable RL methods.
- Contribute to research publications and the internal knowledge base.
Requirements
- Deep expertise in RLHF, reward modeling, and credit attribution.
- Experience training or fine-tuning 30B+ large language models at scale using distributed training infrastructure.
- Proficiency in Python, PyTorch, HuggingFace, and RL frameworks such as Verl and SkyRL.
- Strong software engineering fundamentals to build extensible research prototypes.
- Familiarity with AWS, GCP, Kubernetes, or Slurm.
- Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred.
Culture & Benefits
- Opportunity to shape priorities and strategic decisions in a rapidly scaling company.
- Support for deepening technical expertise and exploring leadership opportunities.
- Combination of stability with robust funding and the excitement of high growth.
- Inclusive work environment committed to diversity and equal employment opportunities.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →