Назад
2 дня назад

Director, Model Post-Training and Agentic Research (AI)

195 000 - 290 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
director
Английский
c1
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Director, Model Post-Training and Agentic Research (AI): Building post-training and reinforcement learning capabilities for security-specialized AI systems with an accent on RLHF, RLAIF, and agentic research. Focus on designing reward models, building agent harnesses for complex cyber workflows, and developing evaluation methodologies for end-to-end task completion.

Location: Remote (Must be based in the USA)

Salary: $195,000 - $290,000 per year

Company

Global leader in cybersecurity protecting people, processes, and technologies via an AI-native platform.

What you will do

  • Own and personally drive the full post-training pipeline, including SFT, RLHF/RLAIF, agent-RL, and reward modeling.
  • Build agent-RL training environments and harnesses, focusing on scaffolding, tool-calling interfaces, and planning loops.
  • Develop evaluation methodologies for the full agentic stack to measure tool-use reliability and planning coherence.
  • Partner with internal teams to integrate post-training and agentic work into the broader model development loop.
  • Recruit and lead a high-density team of research scientists and ML engineers, setting the technical bar through active contribution.

Requirements

  • MS or PhD in Computer Science, Machine Learning, or a related quantitative discipline.
  • 8+ years of experience in ML research or engineering with meaningful depth in LLM post-training.
  • Hands-on expertise in SFT data pipelines, RLHF/RLAIF, PPO, and reward model design.
  • Demonstrated experience building agentic system harnesses, including tool-use frameworks and memory management.
  • Track record of running high-velocity research programs with disciplined tracking and iteration.
  • Must have US work authorization (E-Verify participant).

Nice to have

  • Experience building RL training environments, including rollout infrastructure and reward shaping.
  • Applying RL techniques in security or adversarial ML domains.
  • Published research in post-training, RLHF, or RL for language agents at top-tier venues (NeurIPS, ICML, ICLR, ACL).
  • Experience adapting open-weight base models (Llama, Qwen) for domain-specialized training.
  • Familiarity with security practitioner workflows, such as penetration testing or incident response.

Culture & Benefits

  • Market leader in compensation and equity awards.
  • Comprehensive physical and mental wellness programs.
  • Competitive vacation and holidays for recharge.
  • Paid parental and adoption leaves.
  • Professional development opportunities for all employees regardless of level.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →