Machine Learning Scientist (AI)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Machine Learning Scientist (AI): Developing and analyzing novel evaluation methodologies for AI models with an accent on human preference signals, model reliability, and alignment. Focus on designing large-scale experiments, building statistical frameworks to improve model performance, and translating research findings into production-ready evaluation systems.
Location: Must be based in the Bay Area
Company
is an open platform created by researchers from UC Berkeleyβs SkyLab, dedicated to evaluating AI model performance and building transparent, human-centered benchmarks for the global AI community.
What you will do
- Design and conduct experiments to evaluate AI model behavior across reasoning, robustness, and user preference dimensions.
- Develop new metrics, methodologies, and protocols that exceed traditional benchmark standards.
- Analyze large-scale human interaction and voting data to derive insights into model performance.
- Collaborate with engineering and product teams to scale research findings into robust production systems.
- Prototype and test research ideas rapidly while maintaining scientific rigor.
- Contribute to the scientific integrity of the LMArena leaderboard through internal reports and external publications.
Requirements
- PhD or equivalent research experience in Machine Learning, Natural Language Processing, or Statistics.
- Deep understanding of LLMs and modern deep learning architectures like Transformers and reinforcement learning.
- Proficiency in Python and research libraries such as PyTorch, JAX, or TensorFlow.
- Demonstrated ability to design experiments with high statistical rigor.
- Track record of publishing research or contributing to open-source ML/AI projects.
- Ability to translate complex research questions into practical, scalable systems.
Culture & Benefits
- Competitive compensation packages with equity.
- Comprehensive health and wellness benefits including medical, dental, and vision coverage.
- Opportunity to contribute to a mission-driven team working on the cutting edge of AI evaluation.
- Collaborative environment valuing transparency, craftsmanship, and curiosity.
- Work with experts from leading institutions like Google, DeepMind, and Stanford.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β