Назад
Company hidden
1 день назад

Applied AI Engineer (Inference)

165 000 - 242 000$
Формат работы
remote (только USA)/hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Applied AI Engineer (Inference): Building and maintaining high-performance model serving capabilities with an accent on benchmarking, profiling, and optimizing LLM inference stacks. Focus on solving complex performance bottlenecks, driving targeted optimizations for real-world production workloads, and ensuring low latency and high throughput on modern GPU hardware.

Location: Hybrid (San Francisco, CA / Sunnyvale, CA / Bellevue, WA) or Remote (US only).

Salary: $165,000 – $242,000

Company

An industry-leading AI infrastructure and development platform provider focused on accelerating the full AI lifecycle from training to inference.

What you will do

  • Build and maintain benchmarking workflows to measure latency, throughput, and quality regressions.
  • Profile model-serving behavior across runtimes and hardware to identify performance bottlenecks.
  • Drive targeted optimization efforts using techniques like quantization, speculative decoding, and batching.
  • Partner with platform engineers to productionize improvements and automate performance testing.
  • Produce clear technical reports and recommendations on model configurations and hardware allocation.
  • Design and run experiments to balance model quality with serving performance.

Requirements

  • Must be a U.S. Person (Citizen, Permanent Resident, Refugee, or Asylee) for export control compliance.
  • 4+ years of experience in machine learning, systems, or performance engineering.
  • Strong programming skills in Python and experience in production environments.
  • Familiarity with LLM inference systems such as vLLM, SGLang, or TensorRT-LLM.
  • Experience running empirical evaluations and translating data into engineering decisions.
  • Strong written communication skills for creating reproducible technical documentation.

Nice to have

  • Experience optimizing workloads on modern GPU hardware.
  • Familiarity with profiling tools like Nsight Systems or PyTorch profilers.
  • Experience using production traces to guide optimization strategies.

Culture & Benefits

  • 100% paid medical, dental, and vision insurance.
  • 401(k) with generous employer match.
  • Flexible PTO and casual work environment.
  • Paid parental leave and family-forming support.
  • Catered daily lunches at office locations.
  • Quarterly team gatherings to support collaboration.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...