Назад
Company hidden
1 час назад

Principal Data Scientist (Agent Builder)

67 000 - 106 000
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK/Portugal
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Data Scientist (Agent Builder): Building and scaling evaluation and quality frameworks for conversational and agentic search on hirify.global’s agent platform with an accent on RAG quality, groundedness/citations, and retrieval+ranking evaluation. Focus on designing offline/online metrics, LLM-as-judge calibration, and productionizing evaluation pipelines to turn ambiguous LLM behavior into measurable product improvements.

Company

hirify.global builds the Search AI Platform that combines search precision with AI intelligence for real-time answers.

What you will do

  • Define evaluation strategy for conversational and agentic search, including offline/online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness/citation checks, and A/B testing.
  • Design quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.
  • Build and compare retrieval improvements across sparse/dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.
  • Translate experimental results into product decisions for model choice, efficient routing, tool exposure, and agent customization across hirify.global use cases.
  • Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality and related KPIs.
  • Mentor data scientists and engineers on experiment design, evaluation methodology, statistical rigor, and improving LLM-powered systems.

Requirements

  • 8+ years of applied DS/ML experience with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences.
  • Proven experience leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge, groundedness/citation quality, and model comparison.
  • Hands-on ability with Python and common ML tooling (PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets).
  • Strong understanding of retrieval systems (dense/sparse retrieval, re-ranking, vector search, query understanding) and evaluation metrics (nDCG, MRR, Recall@k, precision) plus latency/cost trade-offs.
  • Experience collaborating with engineering to move prototypes to production using telemetry, dashboards, CI guardrails, and quality regression tracking.
  • Practical hirify.globalsearch experience (ES|QL familiarity is a plus).

Culture & Benefits

  • Competitive pay with health coverage for you and your family in many locations.
  • Flexible locations and schedules for many roles.
  • Generous vacation days and up to 40 hours/year for volunteer projects.
  • Parental leave with a minimum of 16 weeks.
  • Security and privacy responsibilities aligned with hirify.global’s Secure Software Development Framework (SSDF).

Hiring process

  • Interviews focused on evaluation methodology, technical trade-offs, and applied leadership in LLM-powered systems.
  • Cross-functional discussions with engineering, product, UX, and data science stakeholders.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →