Назад
Company hidden
5 дней назад

Machine Learning Engineer - Evaluation (AI)

120 000 - 235 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Machine Learning Engineer (Evaluation): Build LLM-powered evaluation pipelines that assess AI usage skills consistently and at scale with an accent on rubric design, model reliability, and bias auditing. Focus on running experiments to define effective evaluation methodologies, building RAG pipelines and fine-tuning workflows, and creating benchmarking infrastructure for production-grade assessments.

Location: Hybrid in Santa Clara, CA

Salary: $120,000 - $235,000, plus 10% annual bonus, equity, and benefits

Company

Skills-based hiring platform trusted by 2,500+ innovative companies like NVIDIA, Amazon, and Microsoft to evaluate and upskill developers.

What you will do

  • Build LLM-powered evaluation pipelines for consistent, fair assessment of AI-assisted coding skills at production scale
  • Own end-to-end evaluation methodology including rubrics, model application, auditing for bias, and explainability
  • Design and run experiments to determine optimal evaluation practices in AI-augmented environments
  • Develop RAG pipelines and fine-tuning workflows to enforce reliable model adherence to evaluation rules
  • Define benchmarking infrastructure to measure evaluation quality improvements and detect regressions
  • Translate model outputs into understandable results for product managers, customers, and candidates

Requirements

  • Shipped LLM-powered systems in production with strict consistency and reliability requirements
  • Rigorous approach to model evaluation and measurement, prioritizing well-constructed evals
  • Research mindset for inventing methodologies in unsolved problem spaces
  • Systems thinking across data pipelines, models, serving layers, and enforced rubrics
  • Ability to explain ML judgments clearly to non-ML stakeholders

Nice to have

  • Experience with evaluation frameworks for generative or conversational AI
  • Background in educational assessment, psychometrics, or large-scale human-in-the-loop evaluation
  • Publications or open-source in LLM evaluation, benchmarking, or alignment
  • Prior work shipping research into production products

Culture & Benefits

  • High standards, urgency, and deep care for impactful work and details
  • Comprehensive benefits package including cash and non-cash perks
  • Equity through stock options
  • Equal opportunity employer committed to diversity and inclusion

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →