Senior ML Engineer (Evaluation) (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior ML Engineer (Evaluation) (AI): Own the engineering stack to run large-scale evaluations of Multimodal Large Language Models on clinical benchmarks with an accent on automated pipelines, inference services, and production-grade observability. Focus on designing scalable workflows, ensuring reproducibility and throughput, and maturing the full eval/MLOps lifecycle for reliable clinical-grade performance.
Location: Based in Zurich or Amsterdam, with ∼50% time in the office (hybrid).
Company
is building a next-generation agentic clinical AI assistant for reasoning across patient data, guidelines, and diagnostics, with specialized agents in oncology, radiology, and pathology, in collaboration with leading hospitals.
What you will do
- Design, operate, and mature automated pipelines for large-scale evaluation jobs across clinical benchmarks.
- Maintain inference and eval services for correctness, reproducibility, and high throughput as models and benchmarks grow.
- Ensure eval stack integrity through rigorous testing, validation, and support for ML researchers.
- Own end-to-end Eval/MLOps: deployments, versioning, data organization, and observability.
- Act as technical lead: set engineering direction, make architectural decisions, and guide other engineers.
Requirements
- Excellent Python skills and strong software engineering fundamentals: testing, modular design, CI/CD, code review, monorepo tooling.
- Experience designing/operating workflow orchestration and automated pipelines, full deployment lifecycle: containerisation, config management, observability, incident response.
- Proven experience building/operating ML infrastructure at scale, ideally for LLMs or multimodal models.
- Solid understanding of distributed compute, GPU workloads, cluster scheduling, resource management.
- Ability to reason about model internals: tokenisation, numerical precision, tensor shapes, inference behaviour.
- Strong motivation to advance clinical AI through excellent engineering.
Nice to have
- Experience as technical lead: setting direction, architectural trade-offs, guiding engineers.
- Hands-on with stack: Dagster, Ray, vLLM or similar.
- Eval harness experience: lm-eval-harness, HF Evaluate.
- Safety/reliability mindset: red-teaming, load testing, production AI quality practices.
Culture & Benefits
- Ownership: autonomy to set goals, make decisions, direct impact.
- Collaboration: approach disagreement with curiosity, build solutions together.
- Ambition: high standards, relentless pursuit of better patient outcomes.
- Competitive salary, pension plan, 25 vacation days, EUR 1000 learning budget.
- Great offsites, team events, commuting subsidy, work autonomy/flexibility.
Hiring process
- Screening call: align on motivation, goals, initial fit.
- Technical interview: problem-solving via challenge, case study, or scenario.
- Onsite meeting (optional): meet team for collaboration, fit, context.
- Final executive conversation: long-term alignment, cultural fit, expectations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →