Senior Software Engineer, ML Infrastructure (AI)

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Software Engineer, ML Infrastructure (AI): Designing and operating scalable GPU-backed infrastructure for AI systems with an accent on high-throughput inference and distributed serving patterns. Focus on optimizing GPU occupancy, implementing model parallelism, and building observability for production-grade ML workloads.

Location: Hybrid (San Francisco, CA)

Company

hirify.global provides blockchain analytics and AI solutions to help agencies and financial institutions detect and disrupt crypto-related financial crime.

What you will do

Design and operate GPU cluster infrastructure in cloud environments (AWS/GCP), including orchestration and autoscaling.
Optimize high-throughput inference by tuning serving systems for maximum token throughput and cost-effectiveness.
Operationalize distributed inference strategies, including model and tensor parallelism for large-scale models.
Integrate acceleration stacks like TensorRT, ONNX Runtime, vLLM, and FlashAttention to reduce inference costs.
Manage heterogeneous workloads across accelerators (e.g., NVIDIA GPUs, Inferentia) to ensure predictable performance.
Develop observability tools to monitor GPU load, memory utilization, and batching efficiency.

Requirements

5+ years of experience building and operating distributed systems or infrastructure in production.
Experience deploying ML/LLM inference workloads on GPU clusters within AWS or GCP.
Deep knowledge of high-throughput inference systems, batching strategies, and latency/cost trade-offs.
Proficiency with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, or ONNX Runtime.
Experience with Kubernetes or equivalent orchestration systems in cloud environments.
Bachelor’s degree in Computer Science or a related field.

Nice to have

Familiarity with heterogeneous accelerators such as Inferentia.
CUDA familiarity and experience debugging GPU-related issues.

Culture & Benefits

High-velocity, high-ownership environment focused on experimentation and rapid shipping.
Mission-driven work at the intersection of AI, national security, and fighting financial crime.
Expectation of "AI fluency" to accelerate workflows and solve problems.
Culture based on leadership principles: Impact-Oriented Trailblazer, Master Craftsperson, and Inspiring Colleague.
Distributed-first company structure with global hubs.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →