TL;DR
Member Of Technical Staff (AI): Driving the entire alignment stack including instruction tuning, RLHF, and RLAIF to push model performance with an accent on factual accuracy and robust instruction following. Focus on designing next-generation reward models, optimizing large-scale RL pipelines, and curating high-quality training data to solve complex reasoning gaps.
Location: On-site in San Francisco, London, or New York.
Company
hirify.global is an AI startup developing open-weight foundational models to make superintelligence accessible to all.
What you will do
- Lead end-to-end alignment research covering instruction tuning, RLHF, and RLAIF.
- Design next-generation reward models and optimization objectives to improve human preference performance.
- Develop synthetic data pipelines to address reasoning and behavioral limitations.
- Optimize large-scale RL pipelines for stability and computational efficiency.
- Collaborate with pre-training and evaluation teams to build tight feedback loops for model iterations.
Requirements
- Graduate degree (MS or PhD) in Computer Science, Machine Learning, or related field.
- Deep technical command of alignment methodologies such as PPO, DPO, and rejection sampling.
- Proven ability to scale alignment techniques to large-scale models.
- Strong engineering proficiency in complex ML codebases and distributed systems.
- Experience owning ambitious research or engineering agendas with measurable model performance gains.
Culture & Benefits
- Top-tier salary and equity package.
- Comprehensive health, dental, vision, and disability insurance.
- Fully paid parental leave for all new parents with family planning financial support.
- Daily lunch and dinner provided at the office.
- Relocation support and regular team off-sites and celebrations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →