Назад
Company hidden
2 дня назад

Lead Machine Learning Engineer (Inference & Performance)

Формат работы
remote
Тип работы
fulltime
Грейд
lead
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Lead Machine Learning Engineer (Inference & Performance) (AI): Build and optimize production LLM serving with an accent on throughput, latency, and GPU utilization. Focus on engineering inference/training performance using vLLM/SGLang, profiling bottlenecks, and deploying multiple models at scale on shared GPU clusters with Kubernetes.

Location: Remote

Company

hirify.global builds AI products and platforms.

What you will do

  • Optimize Inference by building and tuning production LLM serving with vLLM and SGLang to maximize throughput and minimize latency.
  • Profile and accelerate training/inference runs by instrumenting workloads, identifying bottlenecks, and applying the right attention implementations (e.g., FlashAttention) for the target hardware.
  • Engineer for hardware by applying GPU architecture and attention internals to select approaches per accelerator (H200, GB200).
  • Serve at scale by deploying and operating multiple models on shared GPU clusters on GKE with autoscaling, bin-packing, and mixed-workload handling.
  • Drive efficiency by owning GPU utilization as a first-class metric and improving throughput-per-dollar.
  • Collaborate with clients to translate performance, latency, and cost requirements into serving and training architectures.

Requirements

  • 5+ years of ML/AI engineering experience with a meaningful focus on performance, infrastructure, or systems.
  • Proven experience deploying and optimizing models in production.
  • Demonstrated experience profiling and improving GPU utilization for training and/or inference.
  • Strong Kubernetes (GKE) experience deploying and autoscaling multiple models on shared GPU clusters.
  • Mastery of Python and shell scripting; comfort reading and reasoning about CUDA-adjacent performance code is a strong plus.
  • Knowledge of data engineering and SQL.

Culture & Benefits

  • Remote work setup.
  • Ownership-driven approach from profiling through production optimization.
  • Rigor: measure before optimizing and use data to guide engineering effort.
  • Consultative collaboration with clients to connect technical performance to business value.
  • Emphasis on responsible AI development and data privacy.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →