Назад
Company hidden
1 день назад

Training Performance Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Netherlands/Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Training Performance Engineer (AI): Optimizing large-scale foundation model training on Blackwell clusters with an accent on kernel-level performance, throughput, and cluster fabric efficiency. Focus on solving complex challenges in low-precision training, MoE parallelism, and custom attention-variant kernels to push MFU and uptime.

Location: Must be based in the Netherlands or Switzerland, with an expectation of at least 50% time in the office.

Company

hirify.global is a well-funded startup building a next-generation agentic clinical AI assistant designed to support complex diagnostic workflows and clinical decision-making.

What you will do

  • Instrument and analyze training runs to identify and close utilization gaps.
  • Benchmark NCCL collectives over InfiniBand and NVLink to optimize fabric performance.
  • Drive low-precision training initiatives and validate performance gains.
  • Tune MoE parallelism strategies (TP/PP/CP/EP/DP) to optimize communication costs.
  • Implement and integrate custom attention-variant kernels into the training stack.

Requirements

  • Must be based in the Netherlands or Switzerland.
  • Deep experience with GPU systems, including kernel-level CUDA or Triton.
  • Proficiency with CUTLASS, Flash Attention, PyTorch, and Nsight profiling.
  • Production experience with NCCL on high-bandwidth interconnects like InfiniBand.
  • Strong understanding of parallelism strategies under memory and MFU constraints.

Nice to have

  • Experience with low-precision training (FP8, dynamic loss scaling).
  • Knowledge of sparse, hybrid, or MLA attention at the kernel level.
  • Proven track record of shipping large-scale MoE training in production.
  • Experience with Megatron or NeMo frameworks.

Culture & Benefits

  • Competitive salary and pension plan.
  • 25 days of annual vacation.
  • EUR 1000 annual learning and development budget.
  • Regular offsites and team events.
  • Flexible work environment with commuting subsidy.

Hiring process

  • Screening call to align on motivation and fit.
  • Technical take-home assessment.
  • Technical assessment debrief and collaboration discussion.
  • Final onsite interview to discuss long-term alignment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →