AI Evaluation Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Релокация

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

AI Evaluation Engineer (AI): Building and scaling the evaluation harness for consumer-grade AI agents with an accent on model capability, agentic behavior, and on-device performance. Focus on designing robust eval suites, instrumenting real-user behavior, and establishing the quality bar for production releases.

Location: Must be based in San Francisco (Onsite)

Company

A stealth startup founded by elite researchers from Stanford, OpenAI, and DeepMind, building trustworthy consumer-grade AI agents.

What you will do

Design and implement eval suites that gate every model and agent release.
Build dashboards and tooling to accelerate researcher experiment loops.
Define and enforce the quality bar for what counts as ready to ship.
Collaborate with researchers to ensure metrics align with desired model behaviors.
Instrument real-user behavior on devices to bridge the gap between lab and production.
Translate performance metrics into actionable insights for OEM partners.

Requirements

Must be based in San Francisco and work in person.
Experience in measuring non-deterministic systems, including agentic behavior and long-horizon tasks.
Deep understanding of on-device performance trade-offs.
Ability to define and maintain rigorous quality standards for AI products.
Strong communication skills to influence research roadmaps and product decisions.

Culture & Benefits

Competitive cash compensation and meaningful equity.
Top-tier relocation and immigration support.
Opportunity to work on cutting-edge consumer AI agents with industry leaders.
Fast-paced environment with high impact on product shipping decisions.

Hiring process

Submit a link to an evaluation, benchmark, or measurement system you built with a brief explanation of its impact.
Expect a response within 48 hours for exceptional candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →