Назад
Company hidden
7 часов назад

AI Evaluation Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Evaluation Engineer (AI): Building and scaling the evaluation harness for consumer-grade AI agents with an accent on model capability, agentic behavior, and on-device performance. Focus on designing robust eval suites, instrumenting real-user behavior, and establishing the quality bar for production releases.

Location: Must be based in San Francisco (Onsite)

Company

A stealth startup founded by elite researchers from Stanford, OpenAI, and DeepMind, building trustworthy consumer-grade AI agents.

What you will do

  • Design and implement eval suites that gate every model and agent release.
  • Build dashboards and tooling to accelerate researcher experiment loops.
  • Define and enforce the quality bar for what counts as ready to ship.
  • Collaborate with researchers to ensure metrics align with desired model behaviors.
  • Instrument real-user behavior on devices to bridge the gap between lab and production.
  • Translate performance metrics into actionable insights for OEM partners.

Requirements

  • Must be based in San Francisco and work in person.
  • Experience in measuring non-deterministic systems, including agentic behavior and long-horizon tasks.
  • Deep understanding of on-device performance trade-offs.
  • Ability to define and maintain rigorous quality standards for AI products.
  • Strong communication skills to influence research roadmaps and product decisions.

Culture & Benefits

  • Competitive cash compensation and meaningful equity.
  • Top-tier relocation and immigration support.
  • Opportunity to work on cutting-edge consumer AI agents with industry leaders.
  • Fast-paced environment with high impact on product shipping decisions.

Hiring process

  • Submit a link to an evaluation, benchmark, or measurement system you built with a brief explanation of its impact.
  • Expect a response within 48 hours for exceptional candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →