AI Researcher (Inference Optimization)

Формат работы

remote (Global)

Тип работы

fulltime

Грейд

senior

Английский

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

AI Researcher (Inference Optimization): Design, evaluate, and deploy high-performance inference systems for large-scale machine learning models with an accent on model architecture, systems engineering, and hardware-aware optimization. Focus on researching and implementing model-level and systems-level optimizations to improve latency, throughput, memory efficiency, and cost per inference.

Location: Remote (world)

Company

hirify.global develops optimized inference systems for production AI environments.

What you will do

Research and develop techniques to optimize inference performance for large neural networks.
Improve latency, throughput, memory efficiency, and cost per inference.
Design and evaluate model-level optimizations like quantization, pruning, KV-cache optimization, and architecture simplifications.
Implement systems-level optimizations including dynamic batching, kernel fusion, multi-GPU inference, and prefill vs decode strategies.
Benchmark inference workloads across hardware accelerators and collaborate on deploying optimized pipelines.
Translate research insights into production-ready improvements.

Requirements

Strong background in machine learning, deep learning, or AI systems.
Hands-on experience optimizing inference for large-scale models.
Proficiency in Python and modern ML frameworks (e.g., PyTorch).
Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).
Ability to design experiments and communicate results clearly.

Nice to have

Experience deploying production inference systems at scale.
Familiarity with distributed and multi-GPU inference.
Experience contributing to open-source ML or inference frameworks.
Authorship or co-authorship of peer-reviewed research papers in machine learning or systems.
Experience working close to hardware (CUDA, ROCm, profiling tools).

Culture & Benefits

Work in a collaborative environment focused on measurable impact in production systems.
Opportunity to translate research into real-world deployments.
Emphasis on clear benchmarks, documentation, and informing product decisions.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →