AI Evaluation Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Evaluation Engineer (AI): Building and scaling the evaluation harness for consumer-grade AI agents with an accent on model capability, agentic behavior, and on-device performance. Focus on designing robust eval suites, instrumenting real-user behavior, and establishing the quality bar for production releases.
Location: Must be based in San Francisco (Onsite)
Company
A stealth startup founded by elite researchers from Stanford, OpenAI, and DeepMind, building trustworthy consumer-grade AI agents.
What you will do
- Design and implement eval suites that gate every model and agent release.
- Build dashboards and tooling to accelerate researcher experiment loops.
- Define and enforce the quality bar for what counts as ready to ship.
- Collaborate with researchers to ensure metrics align with desired model behaviors.
- Instrument real-user behavior on devices to bridge the gap between lab and production.
- Translate performance metrics into actionable insights for OEM partners.
Requirements
- Must be based in San Francisco and work in person.
- Experience in measuring non-deterministic systems, including agentic behavior and long-horizon tasks.
- Deep understanding of on-device performance trade-offs.
- Ability to define and maintain rigorous quality standards for AI products.
- Strong communication skills to influence research roadmaps and product decisions.
Culture & Benefits
- Competitive cash compensation and meaningful equity.
- Top-tier relocation and immigration support.
- Opportunity to work on cutting-edge consumer AI agents with industry leaders.
- Fast-paced environment with high impact on product shipping decisions.
Hiring process
- Submit a link to an evaluation, benchmark, or measurement system you built with a brief explanation of its impact.
- Expect a response within 48 hours for exceptional candidates.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →