Назад
Company hidden
10 часов назад

Researcher, Agentic Post-Training (AI)

295 000 - 445 000$
Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Researcher, Agentic Post-Training (AI): Own end-to-end research and engineering projects that improve post-training of hirify.global’s agentic models shipped across Codex, API, ChatGPT with an accent on factuality, instruction following, function calling, multi-agent collaboration, calibrated reasoning, and tool use. Focus on developing horizontal model improvements, building training infrastructure, evals, diagnostics, and feedback loops from product usage.

Location: San Francisco (onsite)

Salary: $295K – $445K

Company

AI research and deployment company pushing boundaries of AI systems through products like ChatGPT, Codex, and API.

What you will do

  • Own end-to-end research and engineering projects improving final post-training of agentic models.
  • Decide integrations ready for major model runs in collaboration with partner teams.
  • Develop horizontal improvements across factuality, instruction following, tool calling, multi-agent behavior, and reasoning calibration.
  • Build and improve training, evaluation, grading, and data infrastructure for large-scale RL/post-training.
  • Create evals and diagnostics to assess model readiness for shipping.
  • Enhance feedback loops from real product usage into post-training, including implicit user feedback.
  • Collaborate with Codex, API, ChatGPT, product, training, and other post-training teams.

Requirements

  • Location: San Francisco (onsite)
  • Strong ML fundamentals and hands-on experience with LLMs, RL, RLHF, post-training, evals, or model training.
  • Unusually strong engineering skills to move quickly in complex systems and make pragmatic decisions.
  • Ability to own ambiguous problems end-to-end without tight roadmaps.
  • Focus on impact over methods, comfortable with unglamorous load-bearing work.
  • Excellent taste in model behavior across user-facing domains.
  • Comfort working across research, infrastructure, data, evals, and product boundaries.

Nice to have

  • Experience with large-scale model training or RL systems.
  • Experience building evals, graders, reward models, or data pipelines for LLM training.
  • Experience with coding agents, tool-using agents, function calling, or multi-agent systems.
  • Background in quant, systems, infra for high-stakes experimentation.
  • Strong product taste in writing, design, code generation, or agent workflows.

Culture & Benefits

  • Work on frontier agentic models powering products used by hundreds of millions.
  • High-agency environment for deeply technical, independent, goal-oriented researchers.
  • Equal opportunity employer committed to diversity and reasonable accommodations for disabilities.
  • Background checks per applicable law, considering qualified applicants with records.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →