Назад
Company hidden
обновлено 1 день назад

Researcher, Agent Post-Training (AI)

Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Researcher, Agent Post-Training (AI): Improving the capabilities, reliability, and product fit of agentic models for power users and API developers with an accent on post-training interventions and behavior improvement. Focus on designing evals from real developer workflows, building training environments, and optimizing tool-use and long-horizon execution.

Location: San Francisco, USA

Company

An AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.

What you will do

  • Design and run experiments to improve model behavior in API and power-user workflows, including function calling, coding, and planning.
  • Build evals, graders, and environments based on real developer workflows to convert failures into training data and hypotheses.
  • Partner with API and product teams to identify behavior gaps and implement post-training interventions.
  • Own end-to-end model behavior projects from qualitative failure analysis through data generation to launch readiness.
  • Develop feedback loops using power-user traces and production-like environments to discover new model gaps.
  • Improve the machinery for large-scale training, focusing on experiment velocity, reliability, and observability.

Requirements

  • Must be based in San Francisco, USA
  • Strong technical fundamentals in ML, software engineering, systems, or statistics.
  • Hands-on experience with LLMs, post-training, RL/RLHF/RLAIF, evals, or production ML systems.
  • Proven ability to turn ambiguous model behavior problems into concrete progress.
  • Experience with synthetic data, coding agents, or tool-using agents.
  • Strong taste for model behavior with the ability to form hypotheses from traces and API interactions.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →