Назад
Company hidden
1 день назад

ML Platform Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
Netherlands/Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

ML Platform Engineer (AI/MLOps): Architecting and scaling infrastructure for foundation-model training and serving with an accent on compute orchestration and model lifecycle management. Focus on building GPU-backed serving, distributed compute layers using Ray and Kubernetes, and ensuring the reliable transition of models from research to production.

Location: Must be based in The Netherlands or Switzerland (Hybrid: at least 50% office time)

Company

Building a next-generation agentic clinical AI assistant to help clinicians reason across patient data and diagnostics.

What you will do

  • Design and evolve infrastructure for fast, reliable, and observable ML development using IaC, CI/CD, and Kubernetes.
  • Scale GPU workloads across on-prem and cloud clusters using Kubernetes and Ray.
  • Own and evolve the AI Factory, specifically the Dagster-based orchestrator and its Ray integration.
  • Build and maintain the model lifecycle layer, including experiment tracking, registry, versioning, and GPU-backed serving.
  • Collaborate with research and product engineering to translate platform requirements into shared infrastructure.
  • Implement engineering rigor through lineage, reproducibility, and comprehensive documentation.

Requirements

  • 2-5 years of experience in production ML platform engineering or MLOps.
  • Proficiency with Kubernetes, Helm, Terraform, Docker, and CI/CD tooling (ArgoCD, GitHub Actions).
  • Experience scheduling GPU workloads on Kubernetes or Ray.
  • Hands-on experience with Linux and NVIDIA GPU environments, including multi-node training stacks and InfiniBand.
  • Familiarity with the full ML workflow: training runs, experiment tracking (MLflow), and model serving.
  • Strong software engineering skills in Python.

Nice to have

  • Experience supporting large-scale foundation-model training/inference (vLLM, Triton, TorchServe).
  • Knowledge of lower-level GPU communication and I/O (RDMA, GPUDirect, NCCL).
  • Experience with Kubernetes-native scheduling for accelerators (Volcano, KAI Scheduler, YuniKorn).
  • Work with high-performance parallel filesystems (Hammerspace, CEPH, WEKA).
  • Exposure to MoE architectures or large-scale distributed training.

Culture & Benefits

  • Competitive salary, pension plan, and 25 vacation days per year.
  • EUR 1000 annual learning and development budget.
  • High degree of autonomy and ownership over goals and critical decisions.
  • Collaborative, international team environment with an emphasis on ambition.
  • Annual commuting subsidy and flexible work arrangements.

Hiring process

  • Screening call to align on motivation and initial fit.
  • Time-limited coding assessment followed by a live debrief session.
  • Deep-dive technical interview focusing on problem-solving and role-specific scenarios.
  • Optional onsite meeting and final executive conversation for cultural alignment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →