Назад
Company hidden
6 дней назад

Senior Principal Network Engineer (AI Infrastructure)

Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Principal Network Engineer (AI Infrastructure): Designing and evolving large-scale Infiniband and RoCE fabric architectures to support high-performance GPU clusters with an accent on reliability, scalability, and long-term evolution. Focus on solving complex network incidents, improving fabric performance predictability, and defining hardware configuration standards for AI interconnects.

Location: Must be based in the US

Company

hirify.global is a GPU cloud provider engineered for AI, offering high-performance infrastructure for AI start-ups and large enterprise customers.

What you will do

  • Own the technical direction and operational strategy for AI interconnect networks.
  • Design and evolve large-scale Infiniband and RoCE fabric architectures to support growth.
  • Act as the senior escalation point for complex network incidents and drive systemic fixes.
  • Drive cross-team initiatives to improve fabric reliability and performance predictability.
  • Define standards for hardware configuration, routing, congestion control, and firmware management.
  • Mentor senior and mid-level network engineers to raise operational rigor.

Requirements

  • 12+ years of experience in network engineering with a focus on HPC, AI, or hyperscale data centers.
  • Expert-level operational and architectural experience with Infiniband and/or large-scale RoCE fabrics.
  • Deep understanding of RDMA internals, congestion management, and fabric-level failure modes.
  • Strong expertise in modern data center routing and control planes (BGP, OSPF, ECMP).
  • Ability to debug cross-layer issues spanning hardware, firmware, kernel, and application libraries.
  • Must be based in the US

Nice to have

  • Extensive experience with NVIDIA/Mellanox networking platforms in production AI/HPC environments.
  • Familiarity with distributed training frameworks and GPU communication patterns.
  • Experience designing network observability systems for high-throughput environments.

Culture & Benefits

  • Highly competitive package including base salary and equity.
  • Performance reviews conducted every 12 months.
  • Flexible workplace autonomy and a human-first approach to flexibility.
  • Remote-first team environment with a culture of innovation and ownership.
  • Dynamic progression plan tailored to individual ambitions.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →