Назад
Company hidden
1 день назад

Inference Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Netherlands/Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Inference Engineer (AI): Optimizing and scaling foundation model inference on Blackwell clusters with an accent on cost-per-token, throughput, and latency. Focus on solving complex systems challenges like disaggregated prefill/decode, KV-cache hierarchy, and low-precision MoE serving.

Location: Must be based in the Netherlands or Switzerland, with an expectation of spending at least 50% of time in the office.

Company

hirify.global is building a next-generation agentic clinical AI assistant designed to support clinicians with longitudinal patient context and complex diagnostic workflows.

What you will do

  • Instrument and analyze the inference stack on Blackwell to optimize token cost, throughput, and latency.
  • Tune scheduling and admission control to maintain cost efficiency across ramp-up and steady-state regimes.
  • Manage the KV-cache hierarchy and optimize the prefill/decode split.
  • Drive low-precision MoE serving while implementing quality regression gates.
  • Collaborate with product and research teams to deploy new models and workloads.

Requirements

  • Must be based in the Netherlands or Switzerland.
  • Deep GPU systems experience, including kernel-level CUDA or Triton development.
  • Proficiency with CUTLASS, FlashInfer/Flash Attention, and Nsight profiling.
  • Proven experience shipping production inference stacks at scale (e.g., vLLM, SGLang, TensorRT-LLM).
  • Strong understanding of roofline models, arithmetic intensity, and KV-cache costs.

Nice to have

  • Experience with quantization kernels (FP8/FP4, AWQ/GPTQ).
  • Expertise in MoE serving, including expert parallelism and routing.
  • Experience scheduling shared training and inference workloads.
  • Background in healthcare or regulated-deployment environments.

Culture & Benefits

  • Competitive salary, pension plan, and 25 days of vacation.
  • EUR 1000 annual learning and development budget.
  • Regular offsites and team events.
  • Annual commuting subsidy.
  • Flexible work environment focused on autonomy and ownership.

Hiring process

  • Screening call to align on motivation and professional goals.
  • Technical take-home assessment.
  • Technical assessment debrief to discuss problem-solving and team fit.
  • Final onsite interview to discuss long-term alignment and impact.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →