Hardware Engineer (GPU & PCIe)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Hardware Engineer (GPU & PCIe): Design, development, and optimization of server hardware infrastructure with an accent on GPU and PCIe troubleshooting. Focus on automating the server hardware lifecycle, performing failure analysis on H100/NVLink systems, and integrating observability platforms.
Location: Hybrid (New York, NY / Sunnyvale, CA / Bellevue, WA) or Remote for candidates located more than 30 miles from an office. Must be a U.S. person (Citizen, Green Card holder, etc.) due to export control regulations.
Salary: $102,000 – $145,000
Company
is The Essential Cloud for AI, providing a platform of technology, tools, and teams that enables innovators to build and scale AI with superior infrastructure performance.
What you will do
- Troubleshoot complex GPU and PCIe related failures and partner with external vendors on failure analysis.
- Develop and maintain hardware/firmware management services and automate all aspects of the server hardware lifecycle.
- Serve as the senior point of contact for hardware escalation and troubleshooting.
- Collaborate with cross-functional teams to define hardware requirements, system architecture, and resolution playbooks.
- Analyze hardware performance, identify bottlenecks, and propose improvements for enhanced efficiency.
- Create and maintain detailed documentation of hardware designs, specifications, and test procedures.
Requirements
- 2+ years of experience supporting and troubleshooting data center class GPUs (H100 or newer, including Infiniband and NVLink).
- Proficiency in Python and Ansible for programmatically interacting with server BMCs using Redfish or IPMI.
- Experience automating GPU diagnostics and troubleshooting tools using observability platforms like Prometheus and Grafana.
- In-depth knowledge of server hardware components, specifically GPUs and PCIe devices.
- Must be a U.S. person (Citizen, Lawful Permanent Resident, Refugee, or Asylee) to comply with U.S. Government export regulations.
Culture & Benefits
- 100% company-paid medical, dental, and vision insurance.
- 401(k) with generous employer match and Employee Stock Purchase Program (ESPP).
- Flexible PTO and company-paid Life, Short-term, and Long-term disability insurance.
- Comprehensive family support including paid parental leave and childcare support via Kinside.
- Daily catered lunch provided at office and data center locations.
- Casual work environment focused on innovative disruption.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →