TL;DR
Applied AI Engineer (Inference): Building and maintaining high-performance model serving capabilities with an accent on benchmarking, profiling, and optimizing LLM inference stacks. Focus on solving complex performance bottlenecks, driving targeted optimizations for real-world production workloads, and ensuring low latency and high throughput on modern GPU hardware.
Location: Hybrid (San Francisco, CA / Sunnyvale, CA / Bellevue, WA) or Remote (US only).
Salary: $165,000 – $242,000
Company
An industry-leading AI infrastructure and development platform provider focused on accelerating the full AI lifecycle from training to inference.
What you will do
- Build and maintain benchmarking workflows to measure latency, throughput, and quality regressions.
- Profile model-serving behavior across runtimes and hardware to identify performance bottlenecks.
- Drive targeted optimization efforts using techniques like quantization, speculative decoding, and batching.
- Partner with platform engineers to productionize improvements and automate performance testing.
- Produce clear technical reports and recommendations on model configurations and hardware allocation.
- Design and run experiments to balance model quality with serving performance.
Requirements
- Must be a U.S. Person (Citizen, Permanent Resident, Refugee, or Asylee) for export control compliance.
- 4+ years of experience in machine learning, systems, or performance engineering.
- Strong programming skills in Python and experience in production environments.
- Familiarity with LLM inference systems such as vLLM, SGLang, or TensorRT-LLM.
- Experience running empirical evaluations and translating data into engineering decisions.
- Strong written communication skills for creating reproducible technical documentation.
Nice to have
- Experience optimizing workloads on modern GPU hardware.
- Familiarity with profiling tools like Nsight Systems or PyTorch profilers.
- Experience using production traces to guide optimization strategies.
Culture & Benefits
- 100% paid medical, dental, and vision insurance.
- 401(k) with generous employer match.
- Flexible PTO and casual work environment.
- Paid parental leave and family-forming support.
- Catered daily lunches at office locations.
- Quarterly team gatherings to support collaboration.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →