Staff / Principal Machine Learning Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff/Principal Machine Learning Engineer (AI): Building and optimizing real-time multimodal model serving and orchestration platforms with an accent on inference optimization and high-performance systems. Focus on reducing latency, implementing advanced acceleration techniques, and scaling distributed GPU inference across multi-node clusters.
Location: Remote within Switzerland. Candidates must already have the legal right to work in Switzerland (no visa sponsorship available).
Company
An AI research lab developing best-in-class real-time multimodal models and a high-performance orchestration platform.
What you will do
- Optimize model serving using modern frameworks such as vLLM or TRT-LLM.
- Implement acceleration techniques including quantization, distillation, caching strategies, and speculative decoding.
- Develop high-performance systems using C++, CUDA, Rust, or highly optimized Python to maximize NVIDIA GPU utility.
- Scale distributed inference using Kubernetes and Ray, managing multi-GPU/multi-node setups for thousands of concurrent connections.
- Take full-cycle ownership of models from research to containerization and stable production serving.
Requirements
- Deep understanding of modern serving frameworks and inference optimization techniques.
- Proficiency in C++, CUDA, Rust, or optimized Python with strong profiling skills.
- Experience with Kubernetes, Ray, and custom load balancing for distributed systems.
- Legal right to work in Switzerland (mandatory).
- Professional English fluency (written and spoken) for collaboration with US-based teams.
Nice to have
- PhD in CS, Physics, Math, or equivalent practical experience building ML systems.
- Open-source contributions to major inference engines.
- Non-trivial systems programming projects or deep-dive technical write-ups.
Culture & Benefits
- Flat organizational structure with fast iterations and minimal process theater.
- High-impact environment where stability and shipping are prioritized over theoretical optimization.
- Culture of ownership where engineers are encouraged to question architecture and design benchmarks.
- Potential for future full U.S. visa and relocation support to the San Francisco Bay Area, subject to business needs.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →