AI Inference Engineer - Model Optimization & Deployment (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Inference Engineer - Model Optimization & Deployment (AI): Optimizing and deploying large-scale models (LLMs, VLMs) for power- and thermal-constrained vehicle SOCs with an accent on quantization, mixed-precision inference, and custom CUDA kernels. Focus on architecting TensorRT pipelines, writing concurrent C++/Python inference code, and ensuring real-time deterministic execution on edge devices.
Hybrid: Foster City, CA / San Diego, CA / Seattle, WA
$242,000 - $290,000 a year
Company
is developing the first ground-up, fully autonomous vehicle fleet and the supporting ecosystem at the intersection of robotics, machine learning, and design.
What you will do
- Optimize large-scale models (LLMs, VLMs) using advanced quantization (PTQ, QAT), mixed-precision inference, and parameter-efficient fine-tuning (LoRA, QLoRA).
- Architect and implement model conversion/compilation pipelines with TensorRT and TensorRT-LLM for edge deployment.
- Perform parity checking, accuracy recovery, and latency benchmarking between PyTorch and compiled edge binaries.
- Write and optimize custom CUDA kernels and TensorRT Plugins for AI accelerators.
- Develop production-level, concurrent, memory-safe C++ and Python code for real-time inference on vehicle SOCs.
Requirements
- Deep expertise in model quantization (PTQ, QAT) and mixed-precision inference (INT8, FP8, INT4, BF16/FP16).
- Proven experience optimizing LLMs/VLMs with KV-cache (PagedAttention), Speculative Decoding, FlashAttention.
- Extensive experience with TensorRT/TensorRT-LLM pipelines and parity/latency benchmarking.
- Proficiency in low-level programming: custom CUDA kernels and TensorRT Plugins.
- Production-level C++ (14/17/20) and Python for concurrent, real-time edge inference.
Nice to have
- Experience with distributed training (PyTorch Distributed, DeepSpeed, Megatron-LM).
- Familiarity with autonomous driving perception (3D detection, BEV, Occupancy Networks, multi-modal sensors).
- Understanding of end-to-end autonomous driving (VLA models, closed-loop simulation).
Culture & Benefits
- Comprehensive benefits: paid time off (sick leave, vacation, bereavement), unpaid time off.
- Equity: Stock Appreciation Rights, Amazon RSUs; possible sign-on bonus.
- Insurance: health, long-term care, long-term/short-term disability, life insurance.
- Fast-moving, execution-oriented team focused on innovation in autonomous mobility.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →