Назад
Company hidden
10 часов назад

Inference Engineer (AI)

200 000 - 300 000$
Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Inference Engineer (AI): Building and optimizing the runtime layer for next-generation AI systems with an accent on high-performance AI compute and hardware efficiency. Focus on designing production inference pipelines, optimizing KV cache systems, and resolving latency bottlenecks across distributed infrastructure.

Location: Onsite – San Francisco, CA

Salary: $200,000–$300,000 base + meaningful equity

Company

An AI infrastructure startup building a platform for next-generation AI systems, recently raised $80M Series A and reached eight-figure revenue.

What you will do

  • Design and optimize production inference pipelines.
  • Improve batching, scheduling, concurrency, and runtime behavior.
  • Optimize KV cache systems and memory efficiency.
  • Debug latency and throughput bottlenecks across model and systems layers.
  • Partner closely with compiler, kernel, and distributed systems engineers.
  • Contribute to large-scale distributed inference infrastructure.

Requirements

  • Hands-on experience building and scaling production ML inference systems.
  • Experience owning inference or model serving infrastructure end-to-end.
  • Strong understanding of distributed systems and runtime behavior under load.
  • Strong Python and/or C++ skills.
  • Experience optimizing latency, throughput, batching, and memory efficiency.
  • Must be based in or able to work onsite in San Francisco, CA

Nice to have

  • Experience with TensorRT-LLM, vLLM, or custom inference runtimes.
  • CUDA, kernel optimization, or compiler-adjacent systems experience.
  • Experience optimizing GPU utilization at scale.
  • Background in AI infrastructure or high-performance compute systems.

Culture & Benefits

  • Meaningful equity in a fast-growing stealth startup.
  • Opportunity to work in a world-class engineering team.
  • High-ownership environment with direct impact on AI infrastructure.
  • Focus on cutting-edge AI compute and distributed systems.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →