Inference Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Inference Engineer (AI): Building and optimizing the runtime layer for next-generation AI systems with an accent on high-performance AI compute and hardware efficiency. Focus on designing production inference pipelines, optimizing KV cache systems, and resolving latency bottlenecks across distributed infrastructure.
Location: Onsite – San Francisco, CA
Salary: $200,000–$300,000 base + meaningful equity
Company
An AI infrastructure startup building a platform for next-generation AI systems, recently raised $80M Series A and reached eight-figure revenue.
What you will do
- Design and optimize production inference pipelines.
- Improve batching, scheduling, concurrency, and runtime behavior.
- Optimize KV cache systems and memory efficiency.
- Debug latency and throughput bottlenecks across model and systems layers.
- Partner closely with compiler, kernel, and distributed systems engineers.
- Contribute to large-scale distributed inference infrastructure.
Requirements
- Hands-on experience building and scaling production ML inference systems.
- Experience owning inference or model serving infrastructure end-to-end.
- Strong understanding of distributed systems and runtime behavior under load.
- Strong Python and/or C++ skills.
- Experience optimizing latency, throughput, batching, and memory efficiency.
- Must be based in or able to work onsite in San Francisco, CA
Nice to have
- Experience with TensorRT-LLM, vLLM, or custom inference runtimes.
- CUDA, kernel optimization, or compiler-adjacent systems experience.
- Experience optimizing GPU utilization at scale.
- Background in AI infrastructure or high-performance compute systems.
Culture & Benefits
- Meaningful equity in a fast-growing stealth startup.
- Opportunity to work in a world-class engineering team.
- High-ownership environment with direct impact on AI infrastructure.
- Focus on cutting-edge AI compute and distributed systems.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →