Director, Model Post-Training and Agentic Research (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Director, Model Post-Training and Agentic Research (AI): Building post-training and reinforcement learning capabilities for security-specialized AI systems with an accent on RLHF, RLAIF, and agentic research. Focus on designing reward models, building agent harnesses for complex cyber workflows, and developing evaluation methodologies for end-to-end task completion.
Location: Remote (Must be based in the USA)
Salary: $195,000 - $290,000 per year
Company
Global leader in cybersecurity protecting people, processes, and technologies via an AI-native platform.
What you will do
- Own and personally drive the full post-training pipeline, including SFT, RLHF/RLAIF, agent-RL, and reward modeling.
- Build agent-RL training environments and harnesses, focusing on scaffolding, tool-calling interfaces, and planning loops.
- Develop evaluation methodologies for the full agentic stack to measure tool-use reliability and planning coherence.
- Partner with internal teams to integrate post-training and agentic work into the broader model development loop.
- Recruit and lead a high-density team of research scientists and ML engineers, setting the technical bar through active contribution.
Requirements
- MS or PhD in Computer Science, Machine Learning, or a related quantitative discipline.
- 8+ years of experience in ML research or engineering with meaningful depth in LLM post-training.
- Hands-on expertise in SFT data pipelines, RLHF/RLAIF, PPO, and reward model design.
- Demonstrated experience building agentic system harnesses, including tool-use frameworks and memory management.
- Track record of running high-velocity research programs with disciplined tracking and iteration.
- Must have US work authorization (E-Verify participant).
Nice to have
- Experience building RL training environments, including rollout infrastructure and reward shaping.
- Applying RL techniques in security or adversarial ML domains.
- Published research in post-training, RLHF, or RL for language agents at top-tier venues (NeurIPS, ICML, ICLR, ACL).
- Experience adapting open-weight base models (Llama, Qwen) for domain-specialized training.
- Familiarity with security practitioner workflows, such as penetration testing or incident response.
Culture & Benefits
- Market leader in compensation and equity awards.
- Comprehensive physical and mental wellness programs.
- Competitive vacation and holidays for recharge.
- Paid parental and adoption leaves.
- Professional development opportunities for all employees regardless of level.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →