Software Development Engineer (AI/ML)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Software Development Engineer (AI/ML): Designing and building cloud-based repair and recovery workflows for NVIDIA GB200/GB300 UltraServers with an accent on system architecture and AWS native services. Focus on automating diagnostic triage, managing GPU cluster network partitions, and ensuring high availability of AI/ML infrastructure.
Location: USA, WA, Seattle
Salary: $143,700–$194,400
Company
A global leader in cloud computing and AI infrastructure providing scalable compute capacity via EC2.
What you will do
- Design and architect cross-functional solutions for Capacity Management, Hardware Engineering, and Datacenter Operations.
- Build cloud-based repair and recovery workflows using AWS native services for scaling infrastructure.
- Implement automation for diagnostic triage, hardware testing, and cable validation processes.
- Manage network partition configurations and firmware validation for multi-node GPU clusters.
- Develop observable systems with appropriate metrics and alarming to monitor UltraServer workflows.
- Collaborate with stakeholders to convert business requirements into technical designs.
Requirements
- 3+ years of non-internship professional software development experience.
- 2+ years of experience in design or architecture of new and existing systems (reliability and scaling).
- Proficiency in at least one software programming language.
- Must be based in or able to work from Seattle, WA.
Nice to have
- 3+ years of full SDLC experience, including coding standards, code reviews, and CI/CD.
- Bachelor's degree in Computer Science or an equivalent qualification.
- Deep knowledge of professional software engineering best practices and operational excellence.
Culture & Benefits
- Comprehensive health insurance including medical, dental, vision, and prescription coverage.
- 401(k) matching program.
- Paid time off and parental leave.
- Inclusive environment with dedicated workplace accommodation and adjustment support.
- Opportunity to work on cutting-edge NVIDIA-based ML infrastructure at scale.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →