Назад
Company hidden
16 дней назад

Inference Engineer (AI)

140 000 - 325 000$
Тип работы
fulltime
Грейд
middle/senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Inference Engineer (AI): Building and optimizing large-scale inference infrastructure for next-generation AI workloads with an accent on runtime performance, memory efficiency, and distributed systems orchestration. Focus on solving complex challenges in KV cache management, request scheduling, and low-latency model serving under production load.

Location: Must be based in the United States (San Francisco, CA)

Salary: $140,000–$325,000

Company

hirify.global is partnering with an AI infrastructure company focused on building high-performance systems for large-scale AI model execution.

What you will do

  • Design and optimize large-scale inference pipelines for production environments.
  • Improve system latency, throughput, and concurrency under heavy real-world load.
  • Build and maintain inference runtimes and serving infrastructure.
  • Optimize request orchestration, batching, and scheduling strategies.
  • Manage KV cache allocation, reuse, and eviction strategies to maximize memory efficiency.
  • Profile and resolve performance bottlenecks across model, runtime, and distributed layers.

Requirements

  • Strong systems engineering fundamentals.
  • Experience building or scaling ML inference and model serving systems.
  • Deep understanding of performance optimization and memory behavior.
  • Proficiency with runtimes such as vLLM, TensorRT-LLM, or custom serving infrastructure.
  • Strong understanding of transformer architectures and attention mechanisms.
  • Strong Python and/or C++ engineering skills.

Culture & Benefits

  • Work on cutting-edge inference infrastructure and foundational AI systems.
  • Join a small, highly technical engineering team.
  • Significant ownership and opportunity for high technical impact.
  • Build systems designed for next-generation AI scale.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →