Inference Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Inference Engineer (AI): Building and optimizing large-scale inference infrastructure for next-generation AI workloads with an accent on runtime performance, memory efficiency, and distributed systems orchestration. Focus on solving complex challenges in KV cache management, request scheduling, and low-latency model serving under production load.
Location: Must be based in the United States (San Francisco, CA)
Salary: $140,000–$325,000
Company
is partnering with an AI infrastructure company focused on building high-performance systems for large-scale AI model execution.
What you will do
- Design and optimize large-scale inference pipelines for production environments.
- Improve system latency, throughput, and concurrency under heavy real-world load.
- Build and maintain inference runtimes and serving infrastructure.
- Optimize request orchestration, batching, and scheduling strategies.
- Manage KV cache allocation, reuse, and eviction strategies to maximize memory efficiency.
- Profile and resolve performance bottlenecks across model, runtime, and distributed layers.
Requirements
- Strong systems engineering fundamentals.
- Experience building or scaling ML inference and model serving systems.
- Deep understanding of performance optimization and memory behavior.
- Proficiency with runtimes such as vLLM, TensorRT-LLM, or custom serving infrastructure.
- Strong understanding of transformer architectures and attention mechanisms.
- Strong Python and/or C++ engineering skills.
Culture & Benefits
- Work on cutting-edge inference infrastructure and foundational AI systems.
- Join a small, highly technical engineering team.
- Significant ownership and opportunity for high technical impact.
- Build systems designed for next-generation AI scale.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →