Machine Learning Engineer (LLM Evals)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Machine Learning Engineer (LLM Evals & Observability): Design and curate evaluation datasets and pipelines for AI assistants and agents with an accent on LLM-powered judges, quality metrics, and observability infrastructure. Focus on building scalable evals, evaluating model changes, and closing the loop for continuous improvement using customer feedback and automated techniques.
Location: Hybrid (3-4 days a week in one of our SF Bay Area offices)
Salary: $200,000 - $300,000 annually
Company
is the Work AI platform powering intelligent enterprise search, AI Assistant, and scalable AI agents with over 100 SaaS connectors.
What you will do
- Design and curate evaluation datasets with sampling strategies, query diversity, and golden sets for reliable coverage of assistant behavior.
- Build and maintain large-scale evaluation pipelines measuring quality across thousands of real user queries.
- Develop LLM-powered judges scoring correctness, completeness, and response quality aligned with human judgment.
- Evaluate new models and product changes to gate launches and prevent regressions.
- Build observability for AI agents including trace enrichment, data pipelines, and dashboards.
- Close the quality loop using eval results, feedback, and techniques like automated prompt iteration.
- Collaborate across teams to integrate evals into the shipping process.
Requirements
- 2+ years of software engineering experience with strong coding skills
- Strong backend fundamentals in Go and Python; comfortable with distributed data pipelines
- Experience with LLM evaluation, RLHF, NLP, or large ML systems
- Analytically rigorous mindset focused on metrics predicting user experience
- Team player thriving in customer-focused, cross-functional environment
- Deep care for quality in systems and product improvement
Culture & Benefits
- Comprehensive benefits: Medical, Vision, Dental coverage, generous time-off, 401k contribution.
- Home office improvement stipend, annual education and wellness stipends.
- Vibrant culture with regular events and daily healthy lunches.
- Commitment to diversity, inclusion, and AI fluency for all hires.
Hiring process
- AI-focused exercise or discussion in interviews to assess AI thinking and usage.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →