Researcher, Agentic Post-Training (AI)

295 000 - 445 000$

Формат работы

onsite

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Researcher, Agentic Post-Training (AI): Own end-to-end research and engineering projects that improve post-training of hirify.global’s agentic models shipped across Codex, API, ChatGPT with an accent on factuality, instruction following, function calling, multi-agent collaboration, calibrated reasoning, and tool use. Focus on developing horizontal model improvements, building training infrastructure, evals, diagnostics, and feedback loops from product usage.

Location: San Francisco (onsite)

Salary: $295K – $445K

Company

AI research and deployment company pushing boundaries of AI systems through products like ChatGPT, Codex, and API.

What you will do

Own end-to-end research and engineering projects improving final post-training of agentic models.
Decide integrations ready for major model runs in collaboration with partner teams.
Develop horizontal improvements across factuality, instruction following, tool calling, multi-agent behavior, and reasoning calibration.
Build and improve training, evaluation, grading, and data infrastructure for large-scale RL/post-training.
Create evals and diagnostics to assess model readiness for shipping.
Enhance feedback loops from real product usage into post-training, including implicit user feedback.
Collaborate with Codex, API, ChatGPT, product, training, and other post-training teams.

Requirements

Location: San Francisco (onsite)
Strong ML fundamentals and hands-on experience with LLMs, RL, RLHF, post-training, evals, or model training.
Unusually strong engineering skills to move quickly in complex systems and make pragmatic decisions.
Ability to own ambiguous problems end-to-end without tight roadmaps.
Focus on impact over methods, comfortable with unglamorous load-bearing work.
Excellent taste in model behavior across user-facing domains.
Comfort working across research, infrastructure, data, evals, and product boundaries.

Nice to have

Experience with large-scale model training or RL systems.
Experience building evals, graders, reward models, or data pipelines for LLM training.
Experience with coding agents, tool-using agents, function calling, or multi-agent systems.
Background in quant, systems, infra for high-stakes experimentation.
Strong product taste in writing, design, code generation, or agent workflows.

Culture & Benefits

Work on frontier agentic models powering products used by hundreds of millions.
High-agency environment for deeply technical, independent, goal-oriented researchers.
Equal opportunity employer committed to diversity and reasonable accommodations for disabilities.
Background checks per applicable law, considering qualified applicants with records.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →