AI Researcher (Inference Optimization)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Researcher (Inference Optimization): Design, evaluate, and deploy high-performance inference systems for large-scale machine learning models with an accent on model architecture, systems engineering, and hardware-aware optimization. Focus on researching and implementing model-level and systems-level optimizations to improve latency, throughput, memory efficiency, and cost per inference.
Location: Remote (world)
Company
develops optimized inference systems for production AI environments.
What you will do
- Research and develop techniques to optimize inference performance for large neural networks.
- Improve latency, throughput, memory efficiency, and cost per inference.
- Design and evaluate model-level optimizations like quantization, pruning, KV-cache optimization, and architecture simplifications.
- Implement systems-level optimizations including dynamic batching, kernel fusion, multi-GPU inference, and prefill vs decode strategies.
- Benchmark inference workloads across hardware accelerators and collaborate on deploying optimized pipelines.
- Translate research insights into production-ready improvements.
Requirements
- Strong background in machine learning, deep learning, or AI systems.
- Hands-on experience optimizing inference for large-scale models.
- Proficiency in Python and modern ML frameworks (e.g., PyTorch).
- Experience with inference tooling (e.g., Triton, TensorRT, vLLM, ONNX Runtime).
- Ability to design experiments and communicate results clearly.
Nice to have
- Experience deploying production inference systems at scale.
- Familiarity with distributed and multi-GPU inference.
- Experience contributing to open-source ML or inference frameworks.
- Authorship or co-authorship of peer-reviewed research papers in machine learning or systems.
- Experience working close to hardware (CUDA, ROCm, profiling tools).
Culture & Benefits
- Work in a collaborative environment focused on measurable impact in production systems.
- Opportunity to translate research into real-world deployments.
- Emphasis on clear benchmarks, documentation, and informing product decisions.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →