Researcher, Alignment Science (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Researcher, Alignment Science (AI): Designing and implementing experiments to ensure frontier models follow user intent and remain honest with an accent on reinforcement learning, calibration, and robustness. Focus on developing scalable alignment methods, building evaluations for failure modes like hallucination and reward hacking, and integrating these techniques into model deployment.
Location: Hybrid in San Francisco, CA (3 days in office). Open to exceptional remote candidates. Relocation assistance provided.
Salary: $250,000 – $445,000 + Equity
Company
An AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
What you will do
- Design and implement alignment experiments focused on intent following, honesty, calibration, and robustness.
- Train and evaluate models using reinforcement learning and other empirical ML methods.
- Develop evaluations for failure modes such as hallucination, reward hacking, covert actions, and scheming.
- Build monitoring and inference-time interventions to ensure compliant behavior.
- Investigate how alignment methods scale with model capability, compute, data, and adversarial pressure.
- Integrate successful alignment techniques into model training and deployment workflows.
Requirements
- Strong hands-on experience training, evaluating, or debugging large ML models, especially LLMs.
- Excellent engineering skills in Python and modern ML frameworks such as PyTorch.
- Mathematical rigor and ability to turn ambiguous research questions into measurable experiments.
- Experience with reinforcement learning, post-training, preference optimization, or scalable oversight.
- Ability to operate with high independence in a fast-paced, collaborative research environment.
- Strong record in technical problem solving (e.g., competitive programming, math contests, or rigorous engineering projects).
Culture & Benefits
- Hybrid work model (3 days in office per week).
- Relocation assistance for new employees.
- Opportunity to produce externally publishable research that advances the science of alignment.
- Collaborative environment working across post-training, RL, and safety teams.
- Focus on building trustworthy, honest, and reliable AI systems for high-stakes settings.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →