Machine Learning Engineer - Evaluation (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Machine Learning Engineer (Evaluation): Build LLM-powered evaluation pipelines that assess AI usage skills consistently and at scale with an accent on rubric design, model reliability, and bias auditing. Focus on running experiments to define effective evaluation methodologies, building RAG pipelines and fine-tuning workflows, and creating benchmarking infrastructure for production-grade assessments.
Location: Hybrid in Santa Clara, CA
Salary: $120,000 - $235,000, plus 10% annual bonus, equity, and benefits
Company
Skills-based hiring platform trusted by 2,500+ innovative companies like NVIDIA, Amazon, and Microsoft to evaluate and upskill developers.
What you will do
- Build LLM-powered evaluation pipelines for consistent, fair assessment of AI-assisted coding skills at production scale
- Own end-to-end evaluation methodology including rubrics, model application, auditing for bias, and explainability
- Design and run experiments to determine optimal evaluation practices in AI-augmented environments
- Develop RAG pipelines and fine-tuning workflows to enforce reliable model adherence to evaluation rules
- Define benchmarking infrastructure to measure evaluation quality improvements and detect regressions
- Translate model outputs into understandable results for product managers, customers, and candidates
Requirements
- Shipped LLM-powered systems in production with strict consistency and reliability requirements
- Rigorous approach to model evaluation and measurement, prioritizing well-constructed evals
- Research mindset for inventing methodologies in unsolved problem spaces
- Systems thinking across data pipelines, models, serving layers, and enforced rubrics
- Ability to explain ML judgments clearly to non-ML stakeholders
Nice to have
- Experience with evaluation frameworks for generative or conversational AI
- Background in educational assessment, psychometrics, or large-scale human-in-the-loop evaluation
- Publications or open-source in LLM evaluation, benchmarking, or alignment
- Prior work shipping research into production products
Culture & Benefits
- High standards, urgency, and deep care for impactful work and details
- Comprehensive benefits package including cash and non-cash perks
- Equity through stock options
- Equal opportunity employer committed to diversity and inclusion
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →