Назад
Company hidden
4 часа назад

Senior Software Engineer, ML Infrastructure (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Software Engineer, ML Infrastructure (AI): Designing and operating scalable GPU-backed infrastructure for AI systems with an accent on high-throughput inference and distributed serving patterns. Focus on optimizing GPU occupancy, implementing model parallelism, and building observability for production-grade ML workloads.

Location: Hybrid (San Francisco, CA)

Company

hirify.global provides blockchain analytics and AI solutions to help agencies and financial institutions detect and disrupt crypto-related financial crime.

What you will do

  • Design and operate GPU cluster infrastructure in cloud environments (AWS/GCP), including orchestration and autoscaling.
  • Optimize high-throughput inference by tuning serving systems for maximum token throughput and cost-effectiveness.
  • Operationalize distributed inference strategies, including model and tensor parallelism for large-scale models.
  • Integrate acceleration stacks like TensorRT, ONNX Runtime, vLLM, and FlashAttention to reduce inference costs.
  • Manage heterogeneous workloads across accelerators (e.g., NVIDIA GPUs, Inferentia) to ensure predictable performance.
  • Develop observability tools to monitor GPU load, memory utilization, and batching efficiency.

Requirements

  • 5+ years of experience building and operating distributed systems or infrastructure in production.
  • Experience deploying ML/LLM inference workloads on GPU clusters within AWS or GCP.
  • Deep knowledge of high-throughput inference systems, batching strategies, and latency/cost trade-offs.
  • Proficiency with ML serving frameworks such as Triton Inference Server, vLLM, Ray Serve, or ONNX Runtime.
  • Experience with Kubernetes or equivalent orchestration systems in cloud environments.
  • Bachelor’s degree in Computer Science or a related field.

Nice to have

  • Familiarity with heterogeneous accelerators such as Inferentia.
  • CUDA familiarity and experience debugging GPU-related issues.

Culture & Benefits

  • High-velocity, high-ownership environment focused on experimentation and rapid shipping.
  • Mission-driven work at the intersection of AI, national security, and fighting financial crime.
  • Expectation of "AI fluency" to accelerate workflows and solve problems.
  • Culture based on leadership principles: Impact-Oriented Trailblazer, Master Craftsperson, and Inspiring Colleague.
  • Distributed-first company structure with global hubs.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →