Inference Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Inference Engineer (AI): Optimizing and scaling foundation model inference on Blackwell clusters with an accent on cost-per-token, throughput, and latency. Focus on solving complex systems challenges like disaggregated prefill/decode, KV-cache hierarchy, and low-precision MoE serving.
Location: Must be based in the Netherlands or Switzerland, with an expectation of spending at least 50% of time in the office.
Company
is building a next-generation agentic clinical AI assistant designed to support clinicians with longitudinal patient context and complex diagnostic workflows.
What you will do
- Instrument and analyze the inference stack on Blackwell to optimize token cost, throughput, and latency.
- Tune scheduling and admission control to maintain cost efficiency across ramp-up and steady-state regimes.
- Manage the KV-cache hierarchy and optimize the prefill/decode split.
- Drive low-precision MoE serving while implementing quality regression gates.
- Collaborate with product and research teams to deploy new models and workloads.
Requirements
- Must be based in the Netherlands or Switzerland.
- Deep GPU systems experience, including kernel-level CUDA or Triton development.
- Proficiency with CUTLASS, FlashInfer/Flash Attention, and Nsight profiling.
- Proven experience shipping production inference stacks at scale (e.g., vLLM, SGLang, TensorRT-LLM).
- Strong understanding of roofline models, arithmetic intensity, and KV-cache costs.
Nice to have
- Experience with quantization kernels (FP8/FP4, AWQ/GPTQ).
- Expertise in MoE serving, including expert parallelism and routing.
- Experience scheduling shared training and inference workloads.
- Background in healthcare or regulated-deployment environments.
Culture & Benefits
- Competitive salary, pension plan, and 25 days of vacation.
- EUR 1000 annual learning and development budget.
- Regular offsites and team events.
- Annual commuting subsidy.
- Flexible work environment focused on autonomy and ownership.
Hiring process
- Screening call to align on motivation and professional goals.
- Technical take-home assessment.
- Technical assessment debrief to discuss problem-solving and team fit.
- Final onsite interview to discuss long-term alignment and impact.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →