Назад
Company hidden
2 дня назад

Principal Machine Learning Engineer, Serving (AI)

270 000 - 500 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Machine Learning Engineer, Serving (AI): Developing and optimizing real-time multimodal models and orchestration platforms for high-throughput inference with an accent on inference optimization, model acceleration, and distributed systems. Focus on building reliable, high-performance systems that can handle thousands of concurrent connections and deliver sub-second inference at scale.

Location: Must be based in or be able to relocate to the Mountain View office

Salary: $270,000 - $500,000+ bonus + equity + benefits

Company

hirify.global is a product-oriented research lab developing realtime multimodal models and the only realtime orchestration platform optimized for thousands of queries per second.

What you will do

  • Containerize and optimize models from the research team for reliable production deployment.
  • Profile code and optimize performance on NVIDIA GPUs.
  • Design and implement custom load balancing solutions.
  • Ensure system stability and reliability in production.
  • Collaborate with the research team to improve model serving.

Requirements

  • Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM.
  • Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
  • Proficiency in C++, CUDA, Rust, or highly optimized Python.
  • Experience with Kubernetes, Ray, multi-GPU/multi-node inference, and handling thousands of concurrent connections.
  • PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.

Culture & Benefits

  • Opportunity to solve unclear problems and design benchmarks to find solutions.
  • Focus on performance, latency, and reliability as first-class product features.
  • Flat structure, fast iterations, and minimal process overhead.
  • Relocation assistance is offered.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →