Software Engineer (ML Inference)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Engineer, ML Inference: Build and scale end-to-end inference systems for a next-generation AI cloud with an accent on runtime, serving infrastructure, memory management, and hardware optimisation. Focus on optimising latency/throughput/concurrency, designing batching/scheduling/queuing systems, improving KV cache efficiency, and debugging bottlenecks across model/runtime/hardware layers.
Location: San Francisco (On-Site)
Salary: $250,000–$320,000 base + equity
Company
Early-stage infrastructure company building a next-generation AI cloud — rethinking how frontier models run across heterogeneous compute environments.
What you will do
- Build and scale end-to-end inference systems from request to runtime to response
- Optimise latency, throughput, concurrency, and reliability under production workloads
- Design batching, scheduling, and queuing systems for high-performance serving
- Improve KV cache management and memory efficiency at scale
- Debug performance bottlenecks across model, runtime, and hardware layers
- Collaborate with systems, infrastructure, and ML teams to advance inference performance
Requirements
- Experience building ML inference or model serving systems
- Strong systems engineering or backend infrastructure fundamentals
- Experience with performance, scaling, memory, or distributed systems challenges
- Strong Python and/or C++ skills
- Must be based in San Francisco for on-site work
Nice to have
- Familiarity with modern inference frameworks and runtimes (vLLM, TensorRT-LLM, custom runtimes)
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →