TL;DR
Senior Data Scientist (AI): Building and optimizing NLP/LLM evaluation solutions across multiple product lines with an accent on statistically sound test designs, metrics, and production-ready code. Focus on translating product requirements into executable study plans, ensuring reproducible pipelines, and delivering concise readouts for roadmap decisions and risk mitigation.
Location: Netherlands (Onsite)
Company
hirify.global's AI Evaluation team designs, builds, and operates NLP/LLM evaluation solutions for various AI products, partnering with Product, Technology, Domain SMEs, and Governance.
What you will do
- Design and implement end-to-end evaluation studies and pipelines for AI products.
- Translate product questions into hypotheses, tasks, rubrics, datasets, and success criteria, defining key metrics.
- Build and maintain Python/SQL evaluation pipelines for data preparation, prompt/rubric generation, scoring, and reporting.
- Plan for statistical rigor, including power analysis, confidence intervals, inter-rater reliability, and significance testing.
- Create analyses highlighting regressions, safety risks, and improvement opportunities, delivering concise write-ups and summaries.
- Produce audit-ready artifacts and follow privacy/security guardrails aligned with Responsible AI practices.
Requirements
- Master’s + 3 years or Bachelor’s + 5 years experience in CS, Data Science, Statistics, Computational Linguistics, or related field.
- Strong Python and SQL skills.
- Experience with LLM/NLP evaluation, data versioning, testing, CI, and cloud-based workflows.
- Familiarity with prompt/rubric design and LLM-as-judge patterns.
- Comfortable with power analysis, confidence intervals, hypothesis testing, inter-rater reliability, and error/slice analysis.
- Clear written/oral communication skills for technical and non-technical stakeholders.
- Must be based in the Netherlands.
Nice to have
- Experience with evaluation of retrieval-augmented or agentic systems or with safety/bias/toxicity measurements.
- Familiarity with lightweight orchestration (e.g., Airflow/Prefect) and containerization basics.
- Exposure to healthcare or education content or working with clinician/academic SMEs.
Culture & Benefits
- Focus on well-being and happiness.
- Access to country-specific benefits.
- Commitment to providing a fair and accessible hiring process with accommodations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →