Researcher, Connectors - Agent Post-Training (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Researcher, Connectors - Agent Post-Training (AI): Training frontier agents to interface with professional software using code and APIs with an accent on post-training techniques like RL and RLHF. Focus on building training signals, evals, and feedback loops to enable complex multi-step workflows across digital contexts.
Location: San Francisco
Salary: $250,000 – $380,000 USD + Equity
Company
is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
What you will do
- Design and execute experiments to improve agentic model behavior for complex software and plugins.
- Develop end-to-end improvements in the post-training stack, including RL, data pipelines, reward signals, and model-behavior analysis.
- Create evals and environments to identify model failures and convert them into training data, product fixes, or research directions.
- Collaborate with Codex and ChatGPT product teams to translate user needs into model improvements.
- Implement early-training and alignment interventions, including data mixtures, objectives, and synthetic data.
- Optimize large-scale training machinery for better velocity, reliability, and production readiness.
Requirements
- Strong technical fundamentals in machine learning, software engineering, systems, or statistics.
- Hands-on experience with LLMs, RL, RLHF/RLAIF, post-training, or production ML systems.
- Ability to translate vague behavioral problems into concrete experiments, hypotheses, and fixes.
- Comfort working across research, product, infrastructure, and safety boundaries.
- Experience with coding agents, tool-using agents, or synthetic data generation.
- Must be located in San Francisco
Culture & Benefits
- Opportunity to work on frontier models that land directly in products used by millions of people.
- High-agency environment focusing on open-ended research and engineering challenges.
- Competitive compensation including significant equity offers.
- Collaborative culture across multidisciplinary teams including safety, alignment, and infrastructure.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →