Principal Machine Learning Engineer, Serving (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Machine Learning Engineer, Serving (AI): Developing and optimizing real-time multimodal models and orchestration platforms for high-throughput inference with an accent on inference optimization, model acceleration, and distributed systems. Focus on building reliable, high-performance systems that can handle thousands of concurrent connections and deliver sub-second inference at scale.
Location: Must be based in or be able to relocate to the Mountain View office
Salary: $270,000 - $500,000+ bonus + equity + benefits
Company
is a product-oriented research lab developing realtime multimodal models and the only realtime orchestration platform optimized for thousands of queries per second.
What you will do
- Containerize and optimize models from the research team for reliable production deployment.
- Profile code and optimize performance on NVIDIA GPUs.
- Design and implement custom load balancing solutions.
- Ensure system stability and reliability in production.
- Collaborate with the research team to improve model serving.
Requirements
- Deep understanding of modern serving frameworks and techniques like vLLM or TRT-LLM.
- Hands-on experience with quantization, distillation, caching strategies, continuous batching, paged attention, and speculative decoding.
- Proficiency in C++, CUDA, Rust, or highly optimized Python.
- Experience with Kubernetes, Ray, multi-GPU/multi-node inference, and handling thousands of concurrent connections.
- PhD in CS, Physics, Math, or equivalent practical experience building backend or ML systems.
Culture & Benefits
- Opportunity to solve unclear problems and design benchmarks to find solutions.
- Focus on performance, latency, and reliability as first-class product features.
- Flat structure, fast iterations, and minimal process overhead.
- Relocation assistance is offered.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →