TL;DR
Senior Hardware Support Engineer (AI): Leading production hardware reliability across large-scale, mission-critical data center environments with an accent on rapid root cause identification and continuous improvement of server and platform reliability. Focus on driving investigations from symptom to root cause and coordinating resolution across engineering, vendors, and on-site teams.
Location: Remote work within the United States. Occasional travel may be required.
Salary: $125,000 – $180,000 per year plus annual performance-based bonus.
Company
hirify.global is a cloud computing company focused on serving the global AI economy by providing tools and resources for AI/ML solutions.
What you will do
- Lead root cause analysis for complex hardware and firmware failures across production fleets.
- Act as the senior escalation point for hardware-related incidents impacting availability or performance.
- Coordinate with vendors to drive timely diagnostics, RMAs, firmware fixes, and corrective actions.
- Partner with internal engineering teams to validate fixes and prevent recurrence.
- Improve hardware observability, failure tracking, and reporting processes.
- Contribute to long-term hardware reliability strategy and fleet-wide stability improvements.
Requirements
- Strong hands-on expertise with server hardware in data center or large-scale production environments.
- Proven experience performing root cause analysis of hardware and firmware failures.
- Deep understanding of server components (CPU, memory, storage, networking, power, BMC) and failure modes.
- Experience working directly with hardware vendors and engineering teams to resolve production issues.
- Structured problem-solving skills using formal IT or incident management methodologies.
- Clear written and verbal communication skills in cross-functional environments.
Nice to have
- Experience in GPU-dense, AI, or high-performance computing environments.
- Exposure to firmware lifecycle management and large-scale rollout validation.
- Familiarity with Linux-based production systems and infrastructure tooling.
Culture & Benefits
- Comprehensive medical, dental, and vision coverage.
- 401(k) plan with company contribution.
- Flexible paid time off and paid parental leave.
- Professional development support.
- Flexible working arrangements and a dynamic and collaborative work environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →