Inference Optimization Engineer (Edge AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Inference Optimization Engineer (Edge AI): Optimizing inference engines like llama.cpp and vLLM for local and edge hardware with an accent on latency, throughput, and memory efficiency. Focus on tuning KV cache, continuous batching, and implementing quantization strategies for efficient local AI execution.
Location: Hybrid (US: Santa Clara, Phoenix, Folsom, or Hillsboro)
Salary: $170,500 - $315,490
Company
's Client Computing Group develops PC products and platforms to deliver purposeful computing experiences.
What you will do
- Profile and optimize local inference (llama.cpp-vulkan, vLLM) for latency and throughput on edge hardware.
- Tune KV cache, continuous batching, and scheduling for interactive agent workloads.
- Drive and validate quantization strategies (GGUF, AWQ, GPTQ) and assess quality impact.
- Reduce CPU overhead and improve model load, startup, and lifecycle management.
- Benchmark performance across hardware tiers and publish comparative analyses.
- Contribute patches and fixes to open-source inference engines.
Requirements
- BS/MS in CS, EE, Math, or a related STEM field.
- 5+ years of software development experience.
- Strong proficiency in C++ and/or Python with ability to read systems-level code.
- Deep understanding of LLM inference (attention, KV cache, decoding).
- Proven experience in profiling and optimizing performance problems on CPU or GPU.
- Expertise in Linux, build systems, and low-level debugging.
Nice to have
- Hands-on experience with llama.cpp, vLLM, or ggml.
- Experience with GPU/accelerator programming (Vulkan, CUDA, SYCL, Metal) or SIMD kernels.
- Familiarity with quantization format quality trade-offs.
- Open-source contributions to inference engines.
Culture & Benefits
- Competitive total compensation package including stock bonuses.
- Comprehensive health, retirement, and vacation programs.
- Hybrid work model allowing split time between on-site and off-site work.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →