2 дня назад

Director, Model Post-Training and Agentic Research (AI)

195 000 - 290 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

director

Английский

Страна

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Director, Model Post-Training and Agentic Research (AI): Building post-training and reinforcement learning capabilities for security-specialized AI systems with an accent on RLHF, RLAIF, and agentic research. Focus on designing reward models, building agent harnesses for complex cyber workflows, and developing evaluation methodologies for end-to-end task completion.

Location: Remote (Must be based in the USA)

Salary: $195,000 - $290,000 per year

Company

Global leader in cybersecurity protecting people, processes, and technologies via an AI-native platform.

What you will do

Own and personally drive the full post-training pipeline, including SFT, RLHF/RLAIF, agent-RL, and reward modeling.
Build agent-RL training environments and harnesses, focusing on scaffolding, tool-calling interfaces, and planning loops.
Develop evaluation methodologies for the full agentic stack to measure tool-use reliability and planning coherence.
Partner with internal teams to integrate post-training and agentic work into the broader model development loop.
Recruit and lead a high-density team of research scientists and ML engineers, setting the technical bar through active contribution.

Requirements

MS or PhD in Computer Science, Machine Learning, or a related quantitative discipline.
8+ years of experience in ML research or engineering with meaningful depth in LLM post-training.
Hands-on expertise in SFT data pipelines, RLHF/RLAIF, PPO, and reward model design.
Demonstrated experience building agentic system harnesses, including tool-use frameworks and memory management.
Track record of running high-velocity research programs with disciplined tracking and iteration.
Must have US work authorization (E-Verify participant).

Nice to have

Experience building RL training environments, including rollout infrastructure and reward shaping.
Applying RL techniques in security or adversarial ML domains.
Published research in post-training, RLHF, or RL for language agents at top-tier venues (NeurIPS, ICML, ICLR, ACL).
Experience adapting open-weight base models (Llama, Qwen) for domain-specialized training.
Familiarity with security practitioner workflows, such as penetration testing or incident response.

Culture & Benefits

Market leader in compensation and equity awards.
Comprehensive physical and mental wellness programs.
Competitive vacation and holidays for recharge.
Paid parental and adoption leaves.
Professional development opportunities for all employees regardless of level.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →