Назад
Company hidden
5 дней назад

AI Infrastructure Engineer (HPC)

Тип работы
fulltime
Грейд
senior
Английский
c1
Страна
Singapore
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

AI Infrastructure Engineer (GPU/HPC): Operating and optimizing GPU clusters and implementing elastic scheduling for inference and training with an accent on high-throughput serving, distributed communication stacks, and unified orchestration. Focus on tuning vLLM/SGLang runtimes, optimizing NCCL/RDMA communication, and building comprehensive observability for GPU utilization.

Location: Singapore, SG

Company

hirify.global is a world-leading technology company specializing in Bitcoin mining solutions and AI cloud services, operating a global portfolio of HPC datacenters.

What you will do

  • Operate and optimize GPU clusters using Kubernetes, Slurm, and Ray across multiple regions.
  • Implement elastic scheduling and unified orchestration for inference and training jobs using Kueue, NVIDIA KAI Scheduler, or KEDA.
  • Manage and tune vLLM and SGLang runtimes for high-throughput, low-latency serving, focusing on continuous batching and KV-cache paging.
  • Optimize distributed communication stacks, including NCCL/RCCL, RDMA over RoCEv2, and InfiniBand.
  • Benchmark and profile performance across various model sizes (7B to 70B+) and precisions (FP8, AWQ, GPTQ).
  • Build observability stacks with Prometheus, Grafana, and OpenTelemetry to monitor GPU utilization and latency.

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or a related field (PhD preferred).
  • 4–8+ years of experience in backend engineering, distributed systems, platform engineering, or applied AI.
  • Strong proficiency in Python, plus experience with Go, TypeScript, Rust, or C++.
  • Hands-on experience with Kubernetes, Slurm, or Ray.
  • Strong background in PyTorch/JAX and distributed communication stacks (NCCL/RCCL, RDMA).
  • Fluent in English.

Nice to have

  • Experience with major cloud platforms and designing production-grade architectures.
  • Familiarity with retrieval systems, embeddings, and vector stores like Qdrant, Chroma, or pgvector.
  • Experience with agent frameworks, tool-calling, function orchestration, or MCP.

Culture & Benefits

  • Opportunity to work in a global environment with datacenters in the US, Bhutan, Norway, Canada, Malaysia, and Ethiopia.
  • Exposure to world-leading technology in ASIC chip design and HPC cloud capabilities.
  • Commitment to equal employment opportunities and a diverse, inclusive workplace.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →