Назад
Company hidden
3 дня назад

Member Of Technical Staff, Hardware Health (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle/senior
Английский
b2
Страна
Switzerland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member of Technical Staff, Hardware Health (AI): Ensuring sustained reliability, performance, and availability across exascale-class deployments of AI training infrastructures with an accent on predictive health models, failure detection frameworks, and autonomous remediation systems. Focus on collaboration with research, hardware, datacenter, and platform engineering teams to keep AI clusters operating at frontier scale.

Location: Zürich, Switzerland. Starting January 26, 2026, MAI employees are expected to work from a designated Microsoft office at least four days a week if they live within 25 miles of that location.

Company

hirify.global operates one of the world’s most advanced AI training infrastructures, featuring multi-gigawatt clusters spanning tens of thousands of high-performance GPUs, ultra-low-latency NVLink/NVSwitch networks, and innovative liquid-cooling systems.

What you will do

  • Design advanced ROCE transport, congestion control, and ECN/WRED/DCTCP tuning.
  • Plan fabric architecture, topology, network modeling, and scaling strategy.
  • Implement telemetry, observability, reliability engineering, and automated troubleshooting.
  • Develop and tune novel routing techniques to achieve reliability in large networks.
  • Collaborate with network designers like NVIDIA, Broadcom, and in-house silicon/network co-design teams.
  • Participate in AI training + inference cluster bring-up, performance benchmarking, and root-cause analysis.

Requirements

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Nice to have

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including C, C++, C#, Java, JavaScript, or Python OR equivalent experience.

Culture & Benefits

  • Embrace a growth mindset, innovate to empower others, and collaborate to achieve shared goals.
  • Build on values of respect, integrity, and accountability to foster a culture of inclusion.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →