Назад
Company hidden
2 дня назад

AI Engineer, Quality (Evals)

170 000 - 220 000$
Формат работы
remote (только USA)/onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Engineer, Quality (Evals): Building evaluation infrastructure for AI agents in complex audit workflows with an accent on unified platforms, observability, and production feedback loops. Focus on designing automated pipelines for rapid model evaluation, implementing comparison frameworks, and ensuring reliability at enterprise scale.

Location: San Francisco, CA or Remote (USA). This role is for engineers who value in-person collaboration at our San Francisco, CA office.

Salary: $170K – $220K

Company

San Francisco-based remote-first Vertical AI company building AI agents for audit and advisory workflows in a $100B+ market, trusted by top accounting and consulting firms.

What you will do

  • Design and build unified evaluation platform as single source of truth for agentic systems and audit workflows.
  • Build observability systems, trace execution, failure modes, and feedback loops from production failures.
  • Own evaluation infrastructure including LangSmith and LangGraph integration.
  • Build automated pipelines for rapid model evaluation against critical workflows.
  • Design evaluation harnesses, comparison frameworks, guardrails, and monitoring for quality regressions.
  • Integrate LLMs, tools, retrieval, and logic into reliable agent experiences; prototype quickly and harden for production.

Requirements

  • Location: San Francisco, CA or Remote (USA); value in-person collaboration in SF office.
  • Multiple years shipping production software in complex systems.
  • Experience with TypeScript, React, Python, Postgres.
  • Built and deployed LLM-powered features in production.
  • Implemented evaluation frameworks for models and agents; observability/tracing for AI/ML.
  • Worked with vector databases, embedding models, RAG; evaluation platforms like LangSmith.

Nice to have

  • Experience with audit/accounting workflows.

Culture & Benefits

  • Remote-first with flexible PTO, 401k, wellness benefits including therapy sessions.
  • Technology and work-from-home reimbursement; flexible schedules.
  • Values: Fearless, Fast, Lovable, Owners, Win-win, Inclusive.
  • Competitive compensation with meaningful ownership; early-stage startup impact.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →