AI Evaluation Engineer (AI)

245 000 - 295 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

AI Evaluation Engineer (AI/ML): Developing and optimizing evaluation frameworks for LLM-based systems and agentic workflows with an accent on data-driven iteration and model quality. Focus on designing feedback loops, curating high-leverage datasets, and analyzing model failure modes to improve contract understanding.

Location: Hybrid in San Francisco or New York City

Salary: $245,000 – $295,000

Company

hirify.global is a leading AI contracting platform that transforms legal agreements into intelligent assets for transformative organizations.

What you will do

Analyze training and evaluation datasets to identify distributional gaps and labeling inconsistencies.
Design and execute labeling campaigns, including the development of golden datasets and annotation guidelines.
Build and maintain dashboards to track model accuracy, regression trends, and product-specific KPIs.
Investigate failure modes via prompt clustering, error taxonomy development, and user intent classification.
Operationalize feedback loops by mining product telemetry and human-in-the-loop reviews.
Partner with engineers and PMs to run structured A/B tests and human evaluations for new features.

Requirements

Bachelor's or Master's degree in a quantitative field (Statistics, Computer Science, Data Science, Applied Math).
8+ years of experience in applied ML or data science, preferably in NLP or LLM-based applications.
Strong proficiency in SQL and Python, including experience with Pandas and experiment tracking tools.
Must be based in or be able to work in a hybrid setup in San Francisco or New York City.
Ability to navigate ambiguity and communicate technical insights to cross-functional stakeholders.

Nice to have

Familiarity with LLM eval techniques, Reinforcement Learning from Human Feedback (RLHF), or agentic system design.
Experience with program management.

Culture & Benefits

100% health coverage for employees (medical, dental, and vision).
Market-leading gender-neutral parental leave and compassionate leave policies.
401(k) plan with employer match for US employees.
Monthly stipends for wellbeing and hybrid work.
Mental health support through Modern Health, including therapy and coaching.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →