Researcher (Reinforcement Learning)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Researcher (Reinforcement Learning): Developing novel reinforcement learning techniques leveraging synthetic data, environments, and feedback to train and evaluate frontier AI models with an accent on self-play, simulators, and other synthetic evaluations. Focus on designing experiments, analyzing learning dynamics, and translating research insights into production training approaches.
Location: Hybrid (San Francisco, CA) with 3 days in office per week. Relocation assistance to San Francisco, CA is offered.
Salary: $310,000β$460,000 + Offers Equity
Company
is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
What you will do
- Research and develop reinforcement learning algorithms.
- Design and run experiments to study training dynamics and model behavior at scale.
- Collaborate with engineers and researchers to integrate successful approaches into model training pipelines.
Requirements
- Strong background in reinforcement learning, machine learning research, or related fields.
- Strong engineering and statistical analysis skills.
- Enjoys exploring new problem spaces where data, objectives, and evaluation are imperfect or evolving.
- Motivated by seeing research ideas influence real-world AI systems.
Culture & Benefits
- Work on open-ended problems with a focus on fast iteration.
- Directly shape how frontier models are trained.
- Committed to providing reasonable accommodations to applicants with disabilities.
- An equal opportunity employer, promoting diversity and inclusion.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β