Назад
Company hidden
5 дней назад

Software Engineer (GPU Networking)

150 000 - 250 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Software Engineer (GPU Networking): Building and optimizing high-performance GPU networking and distributed systems for AI inference with an accent on integrating RDMA capabilities and co-optimizing communication alongside computation. Focus on architecting the software fabric that unifies thousands of GPUs, enabling serverless-grade startup speeds for LLMs, and deep-diving into bleeding-edge hardware performance.

Location: Onsite in San Francisco, US

Salary: $150,000–$250,000 annually, with equity

Company

hirify.global is a fast-growing product company that powers mission-critical AI inference for leading AI companies.

What you will do

  • Integrate RDMA/RoCE/InfiniBand capabilities directly into the inference stack to achieve order-of-magnitude improvements in bandwidth and latency.
  • Implement and tune networking layers for efficient Disaggregated KV Cache Offload and Wide Expert Parallelism (WideEP) for MoE models.
  • Enable sub-10-second startup for trillion-parameter models by working deeply with checkpointing and storage mechanisms.
  • Characterize and validate networking performance on bleeding-edge GPU clusters (H100/H200, B200/B300, GB200/300 NVL72).
  • Design tools to visualize packet flow, congestion, and effective bandwidth across GPU interconnects for diagnosing distributed system behaviors.
  • Work with communication libraries (NCCL, NVSHMEM) and potentially write custom communication kernels to overlap compute and data transfer.

Requirements

  • Deep experience with high-performance networking protocols (InfiniBand, RoCE v2).
  • Proficiency in C++ or Python, with the ability to bridge high-level logic and hardware.
  • Deep understanding of the memory hierarchy in modern NVIDIA architectures (H100/Blackwell) and optimization skills.
  • Ability to deep-dive into TensorRT-LLM source code, write custom C++/Python bindings, or debug NVLink topology issues.
  • Proven ability to build custom solutions when off-the-shelf tools are insufficient for performance needs.
  • Work onsite in San Francisco, US.

Nice to have

  • Deep knowledge of NCCL, NVSHMEM, and UCX.
  • Experience with GPUDirect Storage (GDS) or high-performance filesystems like Weka or 3FS.
  • Familiarity with TensorRT-LLM, vLLM, or Sglang.
  • Experience running low-level benchmarks to qualify new hardware clusters.

Culture & Benefits

  • Competitive compensation, including meaningful equity.
  • 100% coverage of medical, dental, and vision insurance for employee and dependents.
  • Generous PTO policy, including a company-wide Winter Break.
  • Paid parental leave.
  • Company-facilitated 401(k).
  • Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
  • Opportunity to work with bleeding-edge hardware like Blackwell (B200/B300) and Rubin architectures.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...