Senior/Lead Machine Learning Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior/Lead Machine Learning Engineer (AI): Developing and optimizing high-performance serving infrastructure for realtime multimodal models with an accent on inference acceleration and distributed systems. Focus on reducing latency, implementing advanced serving frameworks like vLLM, and scaling multi-GPU inference for thousands of concurrent queries.
Location: Must be based in Serbia. Future relocation to the US (San Francisco Bay Area) may be available with visa support.
Company
is a product-oriented research lab developing best-in-class realtime multimodal models and a high-throughput orchestration platform.
What you will do
- Optimize inference using modern serving frameworks such as vLLM or TRT-LLM.
- Implement model acceleration via quantization, distillation, caching strategies, and speculative decoding.
- Build high-performance systems using C++, CUDA, Rust, or optimized Python.
- Scale distributed systems using Kubernetes and Ray for multi-GPU/multi-node inference.
- Own the full cycle of taking models from research, containerizing them, and ensuring reliable production serving.
Requirements
- Deep expertise in inference optimization and modern serving techniques.
- Proficiency in high-performance languages (C++, CUDA, Rust) or highly optimized Python.
- Experience with Kubernetes, Ray, and handling thousands of concurrent connections.
- Professional fluency in English (written and spoken) is required.
- Must be located in Serbia.
Nice to have
- PhD in CS, Physics, Math, or equivalent practical experience building backend/ML systems.
- Contributions to major open-source inference engines.
- Non-trivial systems programming projects or deep-dive technical write-ups.
Culture & Benefits
- Flat organizational structure with fast iterations and minimal process theater.
- Engineering culture that values impact and stability over purely theoretical optimizations.
- High ownership environment where performance, latency, and reliability are first-class features.
- Support for sharing work and making open-source contributions.
- Potential for future US visa and relocation support to the San Francisco Bay Area.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →