Member of Technical Staff - Post-Training and RL (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Member of Technical Staff - Post-Training and RL (AI): Developing critical post-training and reinforcement learning systems to improve reasoning, truthfulness, and real-world capabilities with an accent on reward modeling and preference optimization. Focus on implementing RLHF/DPO techniques and pushing the boundaries of model alignment and performance.
Location: Palo Alto, CA
Salary: $180,000 - $600,000 USD
Company
is focused on creating AI systems that accurately understand the universe and aid humanity in its pursuit of knowledge.
What you will do
- Address critical post-training and reinforcement learning challenges.
- Develop and optimize reward modeling and preference optimization systems (RLHF/DPO).
- Improve model reasoning, truthfulness, and real-world capabilities.
- Contribute directly to the company's mission within a flat organizational structure.
Requirements
- Strong belief in the importance of truth-seeking AI.
- Obsession with building useful models via post-training and RL techniques.
- Experience as a power user of AI models with a drive to push RL and alignment boundaries.
- Must be located in or able to work from Palo Alto, CA.
Nice to have
- Previous professional experience with post-training or RLHF.
- Experience training AI models that have been deployed to millions of users.
Culture & Benefits
- Flat organizational structure based on meritocracy and initiative.
- Competitive base salary and equity package.
- Comprehensive medical, vision, and dental coverage.
- 401(k) retirement plan and life, short-term, and long-term disability insurance.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →