Inference Engineer (AI)

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Netherlands/Switzerland

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Inference Engineer (AI): Optimizing and scaling foundation model inference on Blackwell clusters with an accent on cost-per-token, throughput, and latency. Focus on solving complex systems challenges like disaggregated prefill/decode, KV-cache hierarchy, and low-precision MoE serving.

Location: Must be based in the Netherlands or Switzerland, with an expectation of spending at least 50% of time in the office.

Company

hirify.global is building a next-generation agentic clinical AI assistant designed to support clinicians with longitudinal patient context and complex diagnostic workflows.

What you will do

Instrument and analyze the inference stack on Blackwell to optimize token cost, throughput, and latency.
Tune scheduling and admission control to maintain cost efficiency across ramp-up and steady-state regimes.
Manage the KV-cache hierarchy and optimize the prefill/decode split.
Drive low-precision MoE serving while implementing quality regression gates.
Collaborate with product and research teams to deploy new models and workloads.

Requirements

Must be based in the Netherlands or Switzerland.
Deep GPU systems experience, including kernel-level CUDA or Triton development.
Proficiency with CUTLASS, FlashInfer/Flash Attention, and Nsight profiling.
Proven experience shipping production inference stacks at scale (e.g., vLLM, SGLang, TensorRT-LLM).
Strong understanding of roofline models, arithmetic intensity, and KV-cache costs.

Nice to have

Experience with quantization kernels (FP8/FP4, AWQ/GPTQ).
Expertise in MoE serving, including expert parallelism and routing.
Experience scheduling shared training and inference workloads.
Background in healthcare or regulated-deployment environments.

Culture & Benefits

Competitive salary, pension plan, and 25 days of vacation.
EUR 1000 annual learning and development budget.
Regular offsites and team events.
Annual commuting subsidy.
Flexible work environment focused on autonomy and ownership.

Hiring process

Screening call to align on motivation and professional goals.
Technical take-home assessment.
Technical assessment debrief to discuss problem-solving and team fit.
Final onsite interview to discuss long-term alignment and impact.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Inference Engineer (AI)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Hiring process

Похожие вакансии

AI Agent Developer (AI)

Solutions Architect, Applied AI (AI)

Phd Position Scientific Machine Learning (Ai)

Co-Founder, CTO (Industrial AI)