Назад
Company hidden
5 дней назад

AI Researcher (Inference Optimization)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Researcher (Inference Optimization): Design, evaluate, and deploy high-performance inference systems for large-scale machine learning models with an accent on model architecture, systems engineering, and hardware-aware optimization. Focus on researching and implementing model-level and systems-level optimizations to improve latency, throughput, memory efficiency, and cost per inference.

Location: Remote (world)

Company

hirify.global develops optimized inference systems for production AI environments.

What you will do

  • Research and develop techniques to optimize inference performance for large neural networks.
  • Improve latency, throughput, memory efficiency, and cost per inference.
  • Design and evaluate model-level optimizations like quantization, pruning, KV-cache optimization, and architecture simplifications.
  • Implement systems-level optimizations including dynamic batching, kernel fusion, multi-GPU inference, and prefill vs decode strategies.
  • Benchmark inference workloads across hardware accelerators and collaborate on deploying optimized pipelines.
  • Translate research insights into production-ready improvements.

Requirements

  • Strong background in machine learning, deep learning, or AI systems.
  • Hands-on experience optimizing inference for large-scale models.
  • Proficiency in Python and modern ML frameworks (e.g., PyTorch).
  • Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).
  • Ability to design experiments and communicate results clearly.

Nice to have

  • Experience deploying production inference systems at scale.
  • Familiarity with distributed and multi-GPU inference.
  • Experience contributing to open-source ML or inference frameworks.
  • Authorship or co-authorship of peer-reviewed research papers in machine learning or systems.
  • Experience working close to hardware (CUDA, ROCm, profiling tools).

Culture & Benefits

  • Work in a collaborative environment focused on measurable impact in production systems.
  • Opportunity to translate research into real-world deployments.
  • Emphasis on clear benchmarks, documentation, and informing product decisions.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →