Machine Learning Engineer - Evaluation (AI)

120 000 - 235 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Machine Learning Engineer (Evaluation): Build LLM-powered evaluation pipelines that assess AI usage skills consistently and at scale with an accent on rubric design, model reliability, and bias auditing. Focus on running experiments to define effective evaluation methodologies, building RAG pipelines and fine-tuning workflows, and creating benchmarking infrastructure for production-grade assessments.

Location: Hybrid in Santa Clara, CA

Salary: $120,000 - $235,000, plus 10% annual bonus, equity, and benefits

Company

Skills-based hiring platform trusted by 2,500+ innovative companies like NVIDIA, Amazon, and Microsoft to evaluate and upskill developers.

What you will do

Build LLM-powered evaluation pipelines for consistent, fair assessment of AI-assisted coding skills at production scale
Own end-to-end evaluation methodology including rubrics, model application, auditing for bias, and explainability
Design and run experiments to determine optimal evaluation practices in AI-augmented environments
Develop RAG pipelines and fine-tuning workflows to enforce reliable model adherence to evaluation rules
Define benchmarking infrastructure to measure evaluation quality improvements and detect regressions
Translate model outputs into understandable results for product managers, customers, and candidates

Requirements

Shipped LLM-powered systems in production with strict consistency and reliability requirements
Rigorous approach to model evaluation and measurement, prioritizing well-constructed evals
Research mindset for inventing methodologies in unsolved problem spaces
Systems thinking across data pipelines, models, serving layers, and enforced rubrics
Ability to explain ML judgments clearly to non-ML stakeholders

Nice to have

Experience with evaluation frameworks for generative or conversational AI
Background in educational assessment, psychometrics, or large-scale human-in-the-loop evaluation
Publications or open-source in LLM evaluation, benchmarking, or alignment
Prior work shipping research into production products

Culture & Benefits

High standards, urgency, and deep care for impactful work and details
Comprehensive benefits package including cash and non-cash perks
Equity through stock options
Equal opportunity employer committed to diversity and inclusion

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →