Назад
Company hidden
18 часов назад

AI Evaluation Engineer (AI)

245 000 - 295 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Evaluation Engineer (AI/ML): Developing and optimizing evaluation frameworks for LLM-based systems and agentic workflows with an accent on data-driven iteration and model quality. Focus on designing feedback loops, curating high-leverage datasets, and analyzing model failure modes to improve contract understanding.

Location: Hybrid in San Francisco or New York City

Salary: $245,000 – $295,000

Company

hirify.global is a leading AI contracting platform that transforms legal agreements into intelligent assets for transformative organizations.

What you will do

  • Analyze training and evaluation datasets to identify distributional gaps and labeling inconsistencies.
  • Design and execute labeling campaigns, including the development of golden datasets and annotation guidelines.
  • Build and maintain dashboards to track model accuracy, regression trends, and product-specific KPIs.
  • Investigate failure modes via prompt clustering, error taxonomy development, and user intent classification.
  • Operationalize feedback loops by mining product telemetry and human-in-the-loop reviews.
  • Partner with engineers and PMs to run structured A/B tests and human evaluations for new features.

Requirements

  • Bachelor's or Master's degree in a quantitative field (Statistics, Computer Science, Data Science, Applied Math).
  • 8+ years of experience in applied ML or data science, preferably in NLP or LLM-based applications.
  • Strong proficiency in SQL and Python, including experience with Pandas and experiment tracking tools.
  • Must be based in or be able to work in a hybrid setup in San Francisco or New York City.
  • Ability to navigate ambiguity and communicate technical insights to cross-functional stakeholders.

Nice to have

  • Familiarity with LLM eval techniques, Reinforcement Learning from Human Feedback (RLHF), or agentic system design.
  • Experience with program management.

Culture & Benefits

  • 100% health coverage for employees (medical, dental, and vision).
  • Market-leading gender-neutral parental leave and compassionate leave policies.
  • 401(k) plan with employer match for US employees.
  • Monthly stipends for wellbeing and hybrid work.
  • Mental health support through Modern Health, including therapy and coaching.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →