Назад
Company hidden
7 дней назад

Lead ML Systems Engineer (Voice AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
Armenia
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Lead ML Systems Engineer (Voice AI): Own the architecture, performance, and scalability of hirify.global Cloud’s real-time Voice AI serving infrastructure. Focus on transforming state-of-the-art research models into highly optimized, reliable, and cost-efficient production systems that power latency-sensitive, mission-critical Voice AI services. Focus on deep systems thinking and long-term architectural ownership.

Location: Hybrid in Armenia

Company

hirify.global develops AI-powered voice clarity software.

What you will do

  • Prototype, implement, and benchmark critical components of the serving stack.
  • Architect and implement inference and serving strategies defining how models are packaged, deployed, replicated, batched, scheduled, and optimized under real-time constraints.
  • Partner with Research and Platform teams to drive deep performance optimization across runtime, precision (FP16/INT8/FP8), batching strategies, and GPU execution.
  • Lead root cause analysis of systemic performance regressions and implement structural improvements.
  • Drive alignment between model design and production constraints, ensuring research translates into performant, scalable, cost-effective systems.
  • Shape the long-term architectural direction for Voice AI serving infrastructure through both implementation and strategic design.

Requirements

  • 5+ years building performance-critical backend or distributed systems.
  • Hands-on experience deploying and operating ML inference systems in production environments.
  • Experience working on latency-sensitive or real-time services.
  • Strong systems background (distributed systems, networking, concurrency, performance engineering).
  • Hands-on experience deploying and optimizing GPU-based inference systems in production (TensorRT or similar runtimes; graph optimization, precision tuning, memory optimization, CUDA-level profiling).
  • Strong programming skills in Python and/or C++.

Nice to have

  • Experience optimizing ASR or TTS systems for real-time production workloads.
  • Experience with streaming inference and low-latency (<200ms) systems.
  • Experience building cost-efficient inference infrastructure at scale.
  • Familiarity with CUDA internals or custom kernel optimization.

Culture & Benefits

  • Equal Opportunity Employer.
  • Treat each other with respect and empathy.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...