Назад
Company hidden
7 Π΄Π½Π΅ΠΉ Π½Π°Π·Π°Π΄

Principal Network Engineer (AI Infrastructure)

150Β 000 - 250Β 000$
Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
remote (Global)
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
principal
Английский
b2
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Principal Network Engineer (AI Infrastructure): Owning the reliability, scalability, and long-term evolution of Infiniband and RDMA-based network fabrics for high-performance GPU cloud with an accent on AI interconnect networks. Focus on designing large-scale fabric architectures, resolving complex incidents, and driving cross-team operational improvements.

Location: Remote-first, geography no barrier to impact

Salary: $150,000 - $250,000 USD

Company

GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI startups and enterprises.

What you will do

  • Own technical direction and operational strategy for AI interconnect networks
  • Design, review, and evolve large-scale Infiniband and RoCE fabric architectures
  • Act as senior escalation point for complex network incidents and systemic fixes
  • Drive cross-team initiatives to improve fabric reliability, performance, and maturity
  • Define standards for hardware, congestion control, routing, firmware, and change safety
  • Partner with SRE, Compute Platform, and Network Architecture teams on system design
  • Mentor engineers and drive improvements in uptime, latency, and efficiency

Requirements

  • 10+ years in network engineering with focus on HPC, AI, or hyperscale data center networking
  • Expert operational and architectural experience with Infiniband and/or large-scale RoCE fabrics
  • Deep understanding of RDMA internals, congestion management, and fabric failure modes
  • Strong expertise in modern data center routing and control planes (BGP, OSPF, ECMP)
  • Proven ability to debug cross-layer issues across hardware, firmware, kernel, and applications
  • Demonstrated leadership in complex technical initiatives without direct authority
  • Systems-level mindset balancing performance, reliability, scalability, and cost

Nice to have

  • Extensive experience with NVIDIA/Mellanox in production AI or HPC environments
  • Deep familiarity with distributed training frameworks and GPU communication patterns
  • Experience designing network observability for high-cardinality environments
  • Prior experience influencing platform or infrastructure strategy at scale

Culture & Benefits

  • Collaborative, supportive, innovative environment with real impact
  • Competitive package (base + equity) with annual reviews
  • Dynamic progression plan with autonomy and support
  • Human-first flexibility, remote-first team with seamless virtual collaboration
  • Competitive benefits including medical, dental, vision, flexible PTO, parental leave, retirement plan

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли Ρ€Π°Π±ΠΎΡ‚ΠΎΠ΄Π°Ρ‚Π΅Π»ΡŒ просит Π²ΠΎΠΉΡ‚ΠΈ Π² ΠΈΡ… систСму, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’

ВСкст вакансии взят Π±Π΅Π· ΠΈΠ·ΠΌΠ΅Π½Π΅Π½ΠΈΠΉ

Π˜ΡΡ‚ΠΎΡ‡Π½ΠΈΠΊ - Π·Π°Π³Ρ€ΡƒΠ·ΠΊΠ°...