Company hidden

2 дня назад

Lead Machine Learning Engineer (Inference & Performance)

Формат работы

remote

Тип работы

fulltime

Грейд

lead

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Lead Machine Learning Engineer (Inference & Performance) (AI): Build and optimize production LLM serving with an accent on throughput, latency, and GPU utilization. Focus on engineering inference/training performance using vLLM/SGLang, profiling bottlenecks, and deploying multiple models at scale on shared GPU clusters with Kubernetes.

Location: Remote

Company

hirify.global builds AI products and platforms.

What you will do

Optimize Inference by building and tuning production LLM serving with vLLM and SGLang to maximize throughput and minimize latency.
Profile and accelerate training/inference runs by instrumenting workloads, identifying bottlenecks, and applying the right attention implementations (e.g., FlashAttention) for the target hardware.
Engineer for hardware by applying GPU architecture and attention internals to select approaches per accelerator (H200, GB200).
Serve at scale by deploying and operating multiple models on shared GPU clusters on GKE with autoscaling, bin-packing, and mixed-workload handling.
Drive efficiency by owning GPU utilization as a first-class metric and improving throughput-per-dollar.
Collaborate with clients to translate performance, latency, and cost requirements into serving and training architectures.

Requirements

5+ years of ML/AI engineering experience with a meaningful focus on performance, infrastructure, or systems.
Proven experience deploying and optimizing models in production.
Demonstrated experience profiling and improving GPU utilization for training and/or inference.
Strong Kubernetes (GKE) experience deploying and autoscaling multiple models on shared GPU clusters.
Mastery of Python and shell scripting; comfort reading and reasoning about CUDA-adjacent performance code is a strong plus.
Knowledge of data engineering and SQL.

Culture & Benefits

Remote work setup.
Ownership-driven approach from profiling through production optimization.
Rigor: measure before optimizing and use data to guide engineering effort.
Consultative collaboration with clients to connect technical performance to business value.
Emphasis on responsible AI development and data privacy.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии

Lead Machine Learning Engineer (Inference & Performance)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Похожие вакансии

Lead AI Engineer (AI)

Sr. Principal Software Engineer (Generative AI)

Senior Lead Software Engineer (AI)

Team Lead, Software Engineer (AI)

Senior Machine Learning Engineer (AI)

Principal Engineer (ML Platform)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Lead Machine Learning Engineer (Inference & Performance)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Categories

Похожие вакансии

Lead AI Engineer (AI)

Sr. Principal Software Engineer (Generative AI)

Senior Lead Software Engineer (AI)

Team Lead, Software Engineer (AI)

Senior Machine Learning Engineer (AI)

Principal Engineer (ML Platform)