Lead ML Systems Engineer (Voice AI)

Формат работы

hybrid

Тип работы

fulltime

Грейд

lead

Английский

Страна

Armenia

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Lead ML Systems Engineer (Voice AI): Own the architecture, performance, and scalability of hirify.global Cloud’s real-time Voice AI serving infrastructure. Focus on transforming state-of-the-art research models into highly optimized, reliable, and cost-efficient production systems that power latency-sensitive, mission-critical Voice AI services. Focus on deep systems thinking and long-term architectural ownership.

Location: Hybrid in Armenia

Company

hirify.global develops AI-powered voice clarity software.

What you will do

Prototype, implement, and benchmark critical components of the serving stack.
Architect and implement inference and serving strategies defining how models are packaged, deployed, replicated, batched, scheduled, and optimized under real-time constraints.
Partner with Research and Platform teams to drive deep performance optimization across runtime, precision (FP16/INT8/FP8), batching strategies, and GPU execution.
Lead root cause analysis of systemic performance regressions and implement structural improvements.
Drive alignment between model design and production constraints, ensuring research translates into performant, scalable, cost-effective systems.
Shape the long-term architectural direction for Voice AI serving infrastructure through both implementation and strategic design.

Requirements

5+ years building performance-critical backend or distributed systems.
Hands-on experience deploying and operating ML inference systems in production environments.
Experience working on latency-sensitive or real-time services.
Strong systems background (distributed systems, networking, concurrency, performance engineering).
Hands-on experience deploying and optimizing GPU-based inference systems in production (TensorRT or similar runtimes; graph optimization, precision tuning, memory optimization, CUDA-level profiling).
Strong programming skills in Python and/or C++.

Nice to have

Experience optimizing ASR or TTS systems for real-time production workloads.
Experience with streaming inference and low-latency (<200ms) systems.
Experience building cost-efficient inference infrastructure at scale.
Familiarity with CUDA internals or custom kernel optimization.

Culture & Benefits

Equal Opportunity Employer.
Treat each other with respect and empathy.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →