Researcher, Agentic Post-Training (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Researcher, Agentic Post-Training (AI): Own end-to-end research and engineering projects that improve post-training of ’s agentic models shipped across Codex, API, ChatGPT with an accent on factuality, instruction following, function calling, multi-agent collaboration, calibrated reasoning, and tool use. Focus on developing horizontal model improvements, building training infrastructure, evals, diagnostics, and feedback loops from product usage.
Location: San Francisco (onsite)
Salary: $295K – $445K
Company
AI research and deployment company pushing boundaries of AI systems through products like ChatGPT, Codex, and API.
What you will do
- Own end-to-end research and engineering projects improving final post-training of agentic models.
- Decide integrations ready for major model runs in collaboration with partner teams.
- Develop horizontal improvements across factuality, instruction following, tool calling, multi-agent behavior, and reasoning calibration.
- Build and improve training, evaluation, grading, and data infrastructure for large-scale RL/post-training.
- Create evals and diagnostics to assess model readiness for shipping.
- Enhance feedback loops from real product usage into post-training, including implicit user feedback.
- Collaborate with Codex, API, ChatGPT, product, training, and other post-training teams.
Requirements
- Location: San Francisco (onsite)
- Strong ML fundamentals and hands-on experience with LLMs, RL, RLHF, post-training, evals, or model training.
- Unusually strong engineering skills to move quickly in complex systems and make pragmatic decisions.
- Ability to own ambiguous problems end-to-end without tight roadmaps.
- Focus on impact over methods, comfortable with unglamorous load-bearing work.
- Excellent taste in model behavior across user-facing domains.
- Comfort working across research, infrastructure, data, evals, and product boundaries.
Nice to have
- Experience with large-scale model training or RL systems.
- Experience building evals, graders, reward models, or data pipelines for LLM training.
- Experience with coding agents, tool-using agents, function calling, or multi-agent systems.
- Background in quant, systems, infra for high-stakes experimentation.
- Strong product taste in writing, design, code generation, or agent workflows.
Culture & Benefits
- Work on frontier agentic models powering products used by hundreds of millions.
- High-agency environment for deeply technical, independent, goal-oriented researchers.
- Equal opportunity employer committed to diversity and reasonable accommodations for disabilities.
- Background checks per applicable law, considering qualified applicants with records.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →