Principal Data Scientist (Agent Builder)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Data Scientist (Agent Builder): Building and scaling evaluation and quality frameworks for conversational and agentic search on ’s agent platform with an accent on RAG quality, groundedness/citations, and retrieval+ranking evaluation. Focus on designing offline/online metrics, LLM-as-judge calibration, and productionizing evaluation pipelines to turn ambiguous LLM behavior into measurable product improvements.
Company
builds the Search AI Platform that combines search precision with AI intelligence for real-time answers.
What you will do
- Define evaluation strategy for conversational and agentic search, including offline/online evaluation, golden datasets, rubrics, LLM-as-judge calibration, groundedness/citation checks, and A/B testing.
- Design quality metrics and decision frameworks for RAG, agents, tools, model selection, agent routing, prompt behavior, and cost/latency trade-offs.
- Build and compare retrieval improvements across sparse/dense retrieval, vector search, query understanding, semantic rewrites, and context enrichment.
- Translate experimental results into product decisions for model choice, efficient routing, tool exposure, and agent customization across use cases.
- Partner with engineering to productionize evaluation pipelines, telemetry, dashboards, CI guardrails, and regression detection for chat quality and related KPIs.
- Mentor data scientists and engineers on experiment design, evaluation methodology, statistical rigor, and improving LLM-powered systems.
Requirements
- 8+ years of applied DS/ML experience with deep expertise in IR, NLP, ranking, semantic search, RAG, or LLM-powered product experiences.
- Proven experience leading evaluation for production AI/ML systems, including offline metrics, online experimentation, LLM-as-judge, groundedness/citation quality, and model comparison.
- Hands-on ability with Python and common ML tooling (PyTorch/Transformers, Pandas, notebooks, reproducible experiments, versioned datasets).
- Strong understanding of retrieval systems (dense/sparse retrieval, re-ranking, vector search, query understanding) and evaluation metrics (nDCG, MRR, Recall@k, precision) plus latency/cost trade-offs.
- Experience collaborating with engineering to move prototypes to production using telemetry, dashboards, CI guardrails, and quality regression tracking.
- Practical search experience (ES|QL familiarity is a plus).
Culture & Benefits
- Competitive pay with health coverage for you and your family in many locations.
- Flexible locations and schedules for many roles.
- Generous vacation days and up to 40 hours/year for volunteer projects.
- Parental leave with a minimum of 16 weeks.
- Security and privacy responsibilities aligned with ’s Secure Software Development Framework (SSDF).
Hiring process
- Interviews focused on evaluation methodology, technical trade-offs, and applied leadership in LLM-powered systems.
- Cross-functional discussions with engineering, product, UX, and data science stakeholders.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →