Назад
Company hidden
2 месяца назад

Senior/Lead Machine Learning Engineer (AI)

Тип работы
fulltime
Грейд
senior/lead
Английский
c1
Страна
Serbia
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior/Lead Machine Learning Engineer (AI): Developing and optimizing high-performance serving infrastructure for realtime multimodal models with an accent on inference acceleration and distributed systems. Focus on reducing latency, implementing advanced serving frameworks like vLLM, and scaling multi-GPU inference for thousands of concurrent queries.

Location: Must be based in Serbia. Future relocation to the US (San Francisco Bay Area) may be available with visa support.

Company

hirify.global is a product-oriented research lab developing best-in-class realtime multimodal models and a high-throughput orchestration platform.

What you will do

  • Optimize inference using modern serving frameworks such as vLLM or TRT-LLM.
  • Implement model acceleration via quantization, distillation, caching strategies, and speculative decoding.
  • Build high-performance systems using C++, CUDA, Rust, or optimized Python.
  • Scale distributed systems using Kubernetes and Ray for multi-GPU/multi-node inference.
  • Own the full cycle of taking models from research, containerizing them, and ensuring reliable production serving.

Requirements

  • Deep expertise in inference optimization and modern serving techniques.
  • Proficiency in high-performance languages (C++, CUDA, Rust) or highly optimized Python.
  • Experience with Kubernetes, Ray, and handling thousands of concurrent connections.
  • Professional fluency in English (written and spoken) is required.
  • Must be located in Serbia.

Nice to have

  • PhD in CS, Physics, Math, or equivalent practical experience building backend/ML systems.
  • Contributions to major open-source inference engines.
  • Non-trivial systems programming projects or deep-dive technical write-ups.

Culture & Benefits

  • Flat organizational structure with fast iterations and minimal process theater.
  • Engineering culture that values impact and stability over purely theoretical optimizations.
  • High ownership environment where performance, latency, and reliability are first-class features.
  • Support for sharing work and making open-source contributions.
  • Potential for future US visa and relocation support to the San Francisco Bay Area.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →