Software Development Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Development Engineer (AI): Designing and maintaining cloud-based repair and recovery workflows for NVIDIA GB200/GB300 UltraServers with an accent on AI/ML infrastructure reliability and scalability. Focus on architecting complex diagnostic systems, automating hardware recovery, and collaborating across engineering teams to ensure high availability.
Location: Seattle, WA, USA (Onsite)
Salary: $143,700 – $194,400 USD annually
Company
A global technology leader providing cloud infrastructure, e-commerce, and artificial intelligence solutions.
What you will do
- Design and build cloud-based workflows for repair and recovery of GPU-intensive UltraServers.
- Develop automated triage for hardware testing, diagnostic, and network partition validation.
- Create observable systems with appropriate metrics and alarm architectures.
- Collaborate with capacity management, hardware engineering, and datacenter teams to drive operational excellence.
- Troubleshoot complex workflow failures and manage multi-node GPU cluster configurations.
Requirements
- 3+ years of professional software development experience.
- 2+ years of experience in system design or architecture (reliability and scaling).
- Proficiency in at least one software programming language.
- Must be based in Seattle, WA area.
Culture & Benefits
- Comprehensive US health insurance (medical, dental, vision).
- 401(k) retirement matching.
- Paid time off and parental leave.
- Includes sign-on bonuses and restricted stock units (RSUs).
- Supportive, inclusive environment with dedicated workplace accommodations for employees with disabilities.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →