Назад
Company hidden
2 месяца назад

Machine Learning Engineer (AI)

Формат работы
remote (Global)
Тип работы
fulltime
Английский
b2
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Machine Learning Engineer (AI): Developing and optimizing high-performance model inference systems with an accent on latency, throughput, and cost efficiency. Focus on profiling GPU/CPU pipelines, implementing quantization techniques, and productionizing cutting-edge model architectures for real-world scale.

Location: Remote (World)

Company

hirify.global is a Series A startup focused on pushing the limits of model inference performance at scale.

What you will do

  • Optimize inference latency, throughput, and cost for large-scale ML models in production.
  • Profile and resolve bottlenecks in GPU/CPU inference pipelines, including memory, kernels, batching, and IO.
  • Implement and tune advanced techniques such as quantization (fp16, bf16, int8, fp8), KV-cache optimization, and speculative decoding.
  • Collaborate with research engineers to transition new model architectures into production-grade systems.
  • Build and maintain inference-serving systems using Triton, custom runtimes, or bespoke stacks.
  • Benchmark performance across diverse hardware (NVIDIA/AMD GPUs, CPUs) and cloud environments.

Requirements

  • Strong experience in ML inference optimization or high-performance ML systems.
  • Deep understanding of deep learning internals, including attention mechanisms, memory layout, and compute graphs.
  • Hands-on proficiency with PyTorch and experience in model deployment.
  • Familiarity with GPU performance tuning using CUDA, ROCm, Triton, or kernel-level optimizations.
  • Proven experience scaling inference for real users beyond research benchmarks.
  • Ability to work in a fast-moving startup environment with high ownership and ambiguity.

Nice to have

  • Experience with LLM or long-context model inference.
  • Knowledge of inference frameworks such as TensorRT, ONNX Runtime, vLLM, or Triton.
  • Experience optimizing across different hardware vendors.
  • Contributions to open-source ML systems or inference tooling.
  • Background in distributed systems or low-latency services.

Culture & Benefits

  • Real ownership over performance-critical systems with direct impact on unit economics.
  • Competitive compensation package including meaningful equity at Series A.
  • Close collaboration with research, infrastructure, and product teams.
  • Engineering culture that prioritizes technical quality over hype.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →