TL;DR

Director of Engineering, Inference Services (AI): Leading a world-class engineering organization to design, build, and operate GPU inference services. Focus on model-serving runtimes, autoscaling micro-batch schedulers, developer-friendly SDKs, and multi-tenant security on %hirify_global%’s accelerated-compute infrastructure.

Location: Sunnyvale, CA / Bellevue, WA. While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration

Salary: $206,000 to $303,000

Company

%hirify_global% is The Essential Cloud for AI™ delivering a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence.

What you will do

Define and refine the Inference Platform roadmap, prioritizing low-latency, high-throughput model serving and developer UX.
Design and implement a Kubernetes-native inference control plane that delivers <50 ms P99 latencies at scale.
Implement state-of-the-art runtime optimizations to improve LLM inference speed and accuracy.
Establish SLOs/SLA dashboards, real-time observability, and self-healing mechanisms.
Hire, mentor, and grow a diverse team of engineers and managers focused on AI inference.
Partner with Product, Orchestration, Networking, and Security teams to deliver a unified %hirify_global% experience.

Requirements

10+ years building large-scale distributed systems or cloud services, with 5+ years leading multiple engineering teams.
Proven success delivering mission-critical model-serving or real-time data-plane services.
Deep understanding of GPU/CPU resource isolation, NUMA-aware scheduling, micro-batching, and low-latency networking.
Track record of optimizing cost-per-token / cost-per-request and hitting sub-100 ms global P99 latencies.
Expertise in Kubernetes, service meshes, and CI/CD for ML workloads.
Hands-on experience with LLM optimization and hardware-aware model compression.
Excellent communicator who can translate deep technical concepts into clear business value for C-suite and engineering audiences.

Nice to have

Experience operating multi-region inference fleets at a cloud provider or hyperscaler.
Contributions to open-source inference or MLOps projects.
Familiarity with observability stacks for AI workloads.
Background in edge inference, streaming inference, or real-time personalization systems.

Culture & Benefits

Medical, dental, and vision insurance - 100% paid for by %hirify_global%.
Flexible Spending Account and Health Savings Account.
Tuition Reimbursement.
401(k) with a generous employer match.
Flexible PTO.
Catered lunch each day in our office and data center locations.

Director Of Engineering, Inference Services (AI)