Applied AI Engineer (Inference)

165 000 - 242 000$

Формат работы

remote (только USA)/hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Applied AI Engineer (Inference): Building and maintaining high-performance model serving capabilities with an accent on benchmarking, profiling, and optimizing LLM inference stacks. Focus on solving complex performance bottlenecks, driving targeted optimizations for real-world production workloads, and ensuring low latency and high throughput on modern GPU hardware.

Location: Hybrid (San Francisco, CA / Sunnyvale, CA / Bellevue, WA) or Remote (US only).

Salary: $165,000 – $242,000

Company

An industry-leading AI infrastructure and development platform provider focused on accelerating the full AI lifecycle from training to inference.

What you will do

Build and maintain benchmarking workflows to measure latency, throughput, and quality regressions.
Profile model-serving behavior across runtimes and hardware to identify performance bottlenecks.
Drive targeted optimization efforts using techniques like quantization, speculative decoding, and batching.
Partner with platform engineers to productionize improvements and automate performance testing.
Produce clear technical reports and recommendations on model configurations and hardware allocation.
Design and run experiments to balance model quality with serving performance.

Requirements

Must be a U.S. Person (Citizen, Permanent Resident, Refugee, or Asylee) for export control compliance.
4+ years of experience in machine learning, systems, or performance engineering.
Strong programming skills in Python and experience in production environments.
Familiarity with LLM inference systems such as vLLM, SGLang, or TensorRT-LLM.
Experience running empirical evaluations and translating data into engineering decisions.
Strong written communication skills for creating reproducible technical documentation.

Nice to have

Experience optimizing workloads on modern GPU hardware.
Familiarity with profiling tools like Nsight Systems or PyTorch profilers.
Experience using production traces to guide optimization strategies.

Culture & Benefits

100% paid medical, dental, and vision insurance.
401(k) with generous employer match.
Flexible PTO and casual work environment.
Paid parental leave and family-forming support.
Catered daily lunches at office locations.
Quarterly team gatherings to support collaboration.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...