2 месяца назад
Machine Learning Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
Machine Learning Engineer (AI): Developing and optimizing high-performance model inference systems with an accent on latency, throughput, and cost efficiency. Focus on profiling GPU/CPU pipelines, implementing quantization techniques, and productionizing cutting-edge model architectures for real-world scale.
Location: Remote (World)
Company
is a Series A startup focused on pushing the limits of model inference performance at scale.
What you will do
- Optimize inference latency, throughput, and cost for large-scale ML models in production.
- Profile and resolve bottlenecks in GPU/CPU inference pipelines, including memory, kernels, batching, and IO.
- Implement and tune advanced techniques such as quantization (fp16, bf16, int8, fp8), KV-cache optimization, and speculative decoding.
- Collaborate with research engineers to transition new model architectures into production-grade systems.
- Build and maintain inference-serving systems using Triton, custom runtimes, or bespoke stacks.
- Benchmark performance across diverse hardware (NVIDIA/AMD GPUs, CPUs) and cloud environments.
Requirements
- Strong experience in ML inference optimization or high-performance ML systems.
- Deep understanding of deep learning internals, including attention mechanisms, memory layout, and compute graphs.
- Hands-on proficiency with PyTorch and experience in model deployment.
- Familiarity with GPU performance tuning using CUDA, ROCm, Triton, or kernel-level optimizations.
- Proven experience scaling inference for real users beyond research benchmarks.
- Ability to work in a fast-moving startup environment with high ownership and ambiguity.
Nice to have
- Experience with LLM or long-context model inference.
- Knowledge of inference frameworks such as TensorRT, ONNX Runtime, vLLM, or Triton.
- Experience optimizing across different hardware vendors.
- Contributions to open-source ML systems or inference tooling.
- Background in distributed systems or low-latency services.
Culture & Benefits
- Real ownership over performance-critical systems with direct impact on unit economics.
- Competitive compensation package including meaningful equity at Series A.
- Close collaboration with research, infrastructure, and product teams.
- Engineering culture that prioritizes technical quality over hype.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Похожие вакансии
Mirai
3 дня назад
ML Research Engineer
Anthropic
3 дня назад
Research Engineer, Code RL (AI)
500 000 - 850 000$
HUD
2 дня назад
ML/RL Research Engineer (AI)
150 000 - 250 000$
Cyn.AI
6 дней назад
Senior ML Engineer (Python, LLM)
7 500$
2 дня назад
Senior Machine Learning Engineer (AI)
6 дней назад