TL;DR
Member of Technical Staff, Hardware Health (AI): Ensuring the sustained reliability, performance, and availability of hirify.global AI's exascale-class AI training infrastructures with an accent on developing predictive health models, failure detection frameworks, and autonomous remediation systems. Focus on AI training and inference cluster bring-up, performance benchmarking, and root-cause analysis to keep AI clusters operating at frontier scale.
Location: London, United Kingdom. Starting January 26, 2026, MAI employees are expected to work from a designated hirify.global office at least four days a week if they live within 25 miles of that location.
Company
hirify.global’s mission is to empower every person and every organization on the planet to achieve more.
What you will do
- Advanced ROCE transport design, congestion control, and ECN/WRED/DCTCP tuning.
- Fabric architecture, topology planning, network modeling, and scaling strategy.
- Telemetry, observability, reliability engineering, and automated troubleshooting.
- Develop and tune the deployment of novel routing techniques to achieve reliability in large networks.
- Work with network designers to develop solutions.
- AI training + inference cluster bring-up, performance benchmarking, and root-cause analysis.
Requirements
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Culture & Benefits
- Embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals.
- Build on values of respect, integrity, and accountability.
- Foster a culture of inclusion where everyone can thrive at work and beyond.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →