Назад
Company hidden
19 часов назад

Principal Network Architect (AI Infrastructure)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Network Architect (AI Infrastructure): Designing and managing high‑performance RDMA, Infiniband, and RoCE fabrics for a global GPU cloud with an accent on reliability, scalability, and operational excellence. Focus on driving automation frameworks, solving complex cross‑layer networking issues, and defining long‑term interconnect strategies.

Location: Remote (Global). Geography is no barrier to impact or connection.

Company

hirify.global is a GPU cloud provider engineered for AI, offering cost-effective, high-performance infrastructure for AI startups and large enterprise customers.

What you will do

  • Lead the technical direction and operational lifecycle of high-performance RDMA network fabrics.
  • Define long-term architecture, reliability strategies, and operational standards for AI interconnect networks.
  • Design, build, and evolve large-scale Infiniband and RoCE fabrics across globally distributed GPU clusters.
  • Develop and scale automation frameworks for network provisioning, validation, and low-touch operations.
  • Drive deep debugging and resolution of complex cross-layer issues involving hardware, firmware, and kernel.
  • Coordinate complex technical initiatives across Network, SRE, Compute, and Platform teams.

Requirements

  • 10+ years of experience in network engineering within hyperscale, AI, or HPC environments.
  • Deep expertise in RDMA, Infiniband, and/or large-scale RoCE fabrics.
  • Expert-level knowledge of data center networking protocols such as BGP, OSPF, and ECMP.
  • Strong programming skills in Python, Go, or similar for network automation.
  • Proven ability to lead complex technical programs and act as a senior escalation point for production issues.
  • Systems-level thinking to balance performance, reliability, scalability, and cost.

Nice to have

  • Experience with NVIDIA / Mellanox networking platforms.
  • Familiarity with distributed AI training frameworks and GPU communication patterns.
  • Experience building large-scale network observability systems.
  • Background in influencing infrastructure strategy in high-growth environments.

Culture & Benefits

  • Competitive compensation package including base salary and equity.
  • Performance and salary reviews conducted every 12 months.
  • Dynamic career progression plan tailored to individual ambitions.
  • Remote-first culture with Human-First Flexibility and high autonomy.
  • Collaborative and innovative environment within a fast-growing tech startup.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →