Network Reliability Engineer (AI)

210 000 - 240 000$

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Network Reliability Engineer (AI): Designing and operating the global network and reliability layer for a high-performance private supercomputer with an accent on distributed compute, ML workloads, and real-time analytics. Focus on building scalable network architecture, automating infrastructure, and ensuring mission-critical system reliability.

Location: Must be based in San Francisco, California (On-site)

Salary: $210,000 – $240,000

Company

hirify.global is a pioneering Causal AI platform helping Fortune 100 enterprises prove business outcomes using trusted, causal evidence.

What you will do

Architect and operate scalable, secure network architecture for large-scale machine learning workloads.
Own network device configuration management end to end to ensure consistency and reliability.
Improve system and network performance through automation, observability, and proactive capacity planning.
Implement and manage complex network protocols including BGP, VPNs, and external peering.
Build and maintain comprehensive monitoring, alerting, and incident response systems.
Partner across engineering and data science to drive a culture of performance and reliability.

Requirements

8+ years in network or infrastructure engineering, with 5+ years in datacenter operations.
Extensive hands-on experience with network devices (firewalls, switches, load balancers) and protocols like BGP, QoS, MPLS, and IPsec.
Experience designing and operating modern datacenter network fabrics (spine-leaf, EVPN/VXLAN, ECMP).
Proficiency in network automation and IaC tooling (Ansible, Terraform, Nornir) and IPAM/DCIM platforms.
Strong operational experience with Linux-based production infrastructure and Kubernetes networking.
Solid scripting skills in Python or Bash for debugging and automation.

Nice to have

Experience with NVIDIA networking technologies (Cumulus Linux, InfiniBand, Spectrum-X, BlueField DPUs).
Familiarity with data-intensive platforms like Spark, Airflow, or Kafka.
Experience with storage network protocols such as NFS, LustreFS, or iSCSI.
Background in high-compliance or SOC 2 environments.

Culture & Benefits

Work on cutting-edge infrastructure including one of the world's fastest private supercomputers.
High-impact role with ownership over architecture decisions for Fortune 100-scale systems.
Generous equity program to ensure meaningful ownership.
Transparent compensation philosophy based on real-time market data.
Collaborative environment with top-tier engineering talent.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →