Network Engineer (Supercomputing)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Network Engineer (Supercomputing): Owning the lowest layers of the network stack for large-scale AI training and inference with an accent on RDMA/RoCE fabric reliability and NVLink/NVSwitch interconnects. Focus on debugging production collectives, building instrumentation tooling, and driving technical resolutions with cloud providers to ensure fleet reliability at multi-thousand-GPU scale.
Location: Must be based in San Francisco, California
Compensation: $350,000 - $475,000 USD
Company
is an AI research organization dedicated to advancing collaborative general intelligence and building accessible tools for the AI community.
What you will do
- Validate and reason about GPU network fabric design across large-scale deployments.
- Debug RDMA/RoCEv2, NCCL failures, and congestion control behavior across NIC vendors.
- Manage NVLink/NVSwitch interconnects, including fabric manager health and link error diagnostics.
- Develop host-level network instrumentation, dashboards, and automated alerts.
- Triage complex issues across NIC, driver, kernel, switch, and workload boundaries.
- Drive escalations with cloud-provider networking teams to ensure end-to-end resolution.
Requirements
- Must be based in San Francisco, California
- Bachelor’s degree or equivalent experience in computer science or engineering.
- Proficiency in Python or Rust.
- Experience operating large-scale clusters and container orchestration systems like Kubernetes or Slurm.
- Ability to own projects end-to-end and thrive in a cross-functional environment.
- Visa sponsorship is available for qualified candidates.
Culture & Benefits
- Generous health, dental, and vision insurance coverage.
- Unlimited PTO and paid parental leave.
- Relocation support provided as needed.
- Opportunity to work on cutting-edge AI infrastructure at massive scale.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →