TL;DR
Member of Technical Staff, Hardware Health (AI): Ensuring the sustained reliability, performance, and availability of advanced AI training infrastructures, including multi-gigawatt GPU clusters and ultra-low-latency networks with an accent on predictive health models, failure detection frameworks, and autonomous remediation systems. Focus on AI training and inference cluster bring-up, performance benchmarking, and root-cause analysis.
Location: Must work from a designated hirify.global office at least four days a week if they live within 25 miles of that location in Zurich, Switzerland.
Company
hirify.global AI is dedicated to advancing Copilot and other consumer AI products and research.
What you will do
- Advance ROCE transport design, congestion control, and ECN/WRED/DCTCP tuning.
- Plan fabric architecture, topology, network modeling, and scaling strategy.
- Implement telemetry, observability, reliability engineering, and automated troubleshooting.
- Develop and tune the deployment of novel routing techniques for large network reliability.
- Collaborate with network designers to bring up AI training and inference clusters, performance benchmarking, and root-cause analysis.
- Gather data and insights to develop the pretraining compute roadmap.
Requirements
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python, or equivalent experience.
- Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python, OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python, or equivalent experience.
Culture & Benefits
- Embrace a growth mindset to empower others and collaborate to achieve shared goals.
- Foster a culture of inclusion, respect, integrity, and accountability.
- Push the boundaries of AI toward Humanist Superintelligence to amplify human potential.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →