Назад
Company hidden
4 дня назад

Senior ML Engineer (Evaluation) (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Netherlands/Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior ML Engineer (Evaluation) (AI): Own the engineering stack to run large-scale evaluations of Multimodal Large Language Models on clinical benchmarks with an accent on automated pipelines, inference services, and production-grade observability. Focus on designing scalable workflows, ensuring reproducibility and throughput, and maturing the full eval/MLOps lifecycle for reliable clinical-grade performance.

Location: Based in Zurich or Amsterdam, with ∼50% time in the office (hybrid).

Company

hirify.global is building a next-generation agentic clinical AI assistant for reasoning across patient data, guidelines, and diagnostics, with specialized agents in oncology, radiology, and pathology, in collaboration with leading hospitals.

What you will do

  • Design, operate, and mature automated pipelines for large-scale evaluation jobs across clinical benchmarks.
  • Maintain inference and eval services for correctness, reproducibility, and high throughput as models and benchmarks grow.
  • Ensure eval stack integrity through rigorous testing, validation, and support for ML researchers.
  • Own end-to-end Eval/MLOps: deployments, versioning, data organization, and observability.
  • Act as technical lead: set engineering direction, make architectural decisions, and guide other engineers.

Requirements

  • Excellent Python skills and strong software engineering fundamentals: testing, modular design, CI/CD, code review, monorepo tooling.
  • Experience designing/operating workflow orchestration and automated pipelines, full deployment lifecycle: containerisation, config management, observability, incident response.
  • Proven experience building/operating ML infrastructure at scale, ideally for LLMs or multimodal models.
  • Solid understanding of distributed compute, GPU workloads, cluster scheduling, resource management.
  • Ability to reason about model internals: tokenisation, numerical precision, tensor shapes, inference behaviour.
  • Strong motivation to advance clinical AI through excellent engineering.

Nice to have

  • Experience as technical lead: setting direction, architectural trade-offs, guiding engineers.
  • Hands-on with stack: Dagster, Ray, vLLM or similar.
  • Eval harness experience: lm-eval-harness, HF Evaluate.
  • Safety/reliability mindset: red-teaming, load testing, production AI quality practices.

Culture & Benefits

  • Ownership: autonomy to set goals, make decisions, direct impact.
  • Collaboration: approach disagreement with curiosity, build solutions together.
  • Ambition: high standards, relentless pursuit of better patient outcomes.
  • Competitive salary, pension plan, 25 vacation days, EUR 1000 learning budget.
  • Great offsites, team events, commuting subsidy, work autonomy/flexibility.

Hiring process

  • Screening call: align on motivation, goals, initial fit.
  • Technical interview: problem-solving via challenge, case study, or scenario.
  • Onsite meeting (optional): meet team for collaboration, fit, context.
  • Final executive conversation: long-term alignment, cultural fit, expectations.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →