Senior Front End Network Engineer (AI Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Front End Network Engineer (AI Infrastructure): Managing and optimizing large-scale Ethernet front-end networks for AI infrastructure with an accent on fabric stability, incident response, and high-performance networking. Focus on troubleshooting complex P0/P1 incidents, automating network provisioning, and tuning latency and throughput for AI workloads.
Location: Remote (Geography is no barrier to impact or connection)
Company
is a GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprises.
What you will do
- Own the operational health, configuration consistency, and performance tuning of large-scale Ethernet front-end fabrics (leaf-spine / Clos).
- Lead the diagnosis and resolution of complex network incidents (P0/P1) spanning optics, routing, switching hardware, and storage connectivity.
- Drive blameless postmortems and implement preventative fixes to improve long-term fabric stability and availability.
- Partner with SREs to define requirements for automation and contribute to network provisioning, validation, and monitoring systems.
- Collaborate with Network Architecture teams to validate designs and enforce standards for routing, congestion management, and firmware baselines.
- Monitor fabric utilization and performance to identify bottlenecks and ensure predictable latency and throughput.
Requirements
- 5+ years of experience in network engineering, with at least 3 years operating large-scale Ethernet data centre or cloud networks.
- Deep, hands-on operational experience with Arista (EOS) and/or Nokia platforms.
- Strong expertise in BGP, OSPF, ECMP, EVPN-VXLAN, and leaf-spine architectures.
- Proven experience with long-haul circuits and DCI (dark fiber, carrier Ethernet, coherent optics).
- Proficiency in Python, Go, or shell scripting for automation and data analysis.
- Experience working in a 24/7 operational environment with a strong focus on reliability.
Nice to have
- Extensive experience with Arista or Nokia platforms at scale.
- Familiarity with front-end network patterns specifically for large AI clusters.
- Experience operating large-scale DCI / long-haul optical or carrier networks.
- Background in network observability and telemetry systems (Prometheus, Grafana, sFlow).
- Prior experience in automation-first network operations.
Culture & Benefits
- Competitive compensation package including base salary and equity with annual reviews.
- Opportunity to join a fast-growing tech startup working on cutting-edge AI infrastructure.
- Dynamic progression plan tailored to personal ambitions and ownership.
- Human-first flexibility with a remote-first approach and high autonomy over your workday.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →