Назад
Company hidden
3 дня назад

Director Of Engineering, Inference Services (AI)

206 000 - 303 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
director
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Director of Engineering, Inference Services (AI): Leading a world-class engineering organization to design, build, and operate GPU inference services. Focus on model-serving runtimes, autoscaling micro-batch schedulers, developer-friendly SDKs, and multi-tenant security on hirify.global’s accelerated-compute infrastructure.

Location: Sunnyvale, CA / Bellevue, WA. While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration

Salary: $206,000 to $303,000

Company

hirify.global is The Essential Cloud for AI™ delivering a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.

What you will do

  • Define and refine the Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and developer UX.
  • Design and implement a Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale.
  • Implement state-of-the-art runtime optimizations to improve LLM inference speed and accuracy.
  • Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms.
  • Hire, mentor, and grow a diverse team of engineers and managers focused on AI inference.
  • Partner with Product, Orchestration, Networking, and Security teams to deliver a unified hirify.global experience.

Requirements

  • 10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams.
  • Proven success delivering mission-critical model-serving or real-time data-plane services.
  • Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking.
  • Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies.
  • Expertise in Kubernetes, service meshes, and CI/CD for ML workloads.
  • Hands-on experience with LLM optimization and hardware-aware model compression.
  • Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences.

Nice to have

  • Experience operating multi-region inference fleets at a cloud provider or hyperscaler.
  • Contributions to open-source inference or MLOps projects.
  • Familiarity with observability stacks for AI workloads.
  • Background in edge inference, streaming inference, or real-time personalization systems.

Culture & Benefits

  • Medical, dental, and vision insurance - 100% paid for by hirify.global.
  • Flexible Spending Account and Health Savings Account.
  • Tuition Reimbursement.
  • 401(k) with a generous employer match.
  • Flexible PTO.
  • Catered lunch each day in our office and data center locations.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →