Research Scientist, RL Training (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Research Scientist, RL Training (AI): Developing reinforcement learning techniques and data pipelines to align large language models with an accent on reward modeling and preference datasets. Focus on implementing RLHF, DPO, and GRPO to create high-quality training signals for frontier AI labs.
Location: Hybrid in Redwood City or San Francisco, CA, or Remote within the United States
Salary: $200,000 - $275,000 USD
Company
helps enterprises transform expert knowledge into specialized AI at scale through a data-centric approach.
What you will do
- Research and implement RL techniques including GRPO, RLHF, RLAIF, and DPO to create data products for LLM fine-tuning.
- Design and build data pipelines for AI-assisted annotation and curation to improve model generalization.
- Prototype end-to-end RL training recipes that inform the data-as-a-service deliveries.
- Collaborate with research, engineering, and delivery teams to translate RL research into customer-ready products.
- Stay current with large-scale multi-node LLM training, alignment research, and scalable RL methods.
- Contribute to internal knowledge bases and research publications in RL and model training.
Requirements
- Deep expertise in RL from human or AI feedback, reward modeling, and credit attribution.
- Experience training or fine-tuning 30B+ large language models at scale using distributed training infrastructure.
- Proficiency in Python, PyTorch, HuggingFace, and RL frameworks such as Verl and SkyRL.
- Strong software engineering fundamentals for building reproducible research prototypes.
- Familiarity with cloud platforms and ML infrastructure (AWS, GCP, Kubernetes, Slurm).
- Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred.
Culture & Benefits
- Environment combining the stability of proven solutions with the excitement of high growth.
- Opportunities to shape strategic priorities and influence key decisions.
- Support for career development and learning across multiple technical functions.
- Equal Employment Opportunity employer committed to diversity and inclusion.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →