Назад
Company hidden
1 день назад

Datacenter Hardware Operations Technician Lead (AI)

86 400 - 228 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Datacenter Hardware Operations Technician Lead (AI): Serving as the senior on-site technical authority for hardware reliability and fleet health at a flagship AI campus with an accent on diagnosing complex hardware failures and driving root cause analysis. Focus on maintaining world-class operational performance for large-scale GPU, server, and storage infrastructure.

Location: Must be based onsite in Abilene, Texas 5 days per week

Compensation: $86.4K – $228K

Company

hirify.global is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.

What you will do

  • Serve as the senior on-site hardware operations lead for server, GPU, storage, and rack-level infrastructure.
  • Drive technical triage and resolution of complex hardware failures impacting production systems.
  • Partner with Fleet Health Engineering to investigate recurring hardware issues and improve fleet reliability.
  • Lead root cause analysis (RCA) efforts for critical hardware incidents and develop corrective action plans.
  • Collaborate with operations teams and OEM vendors to coordinate repairs, upgrades, and lifecycle activities.
  • Establish and improve hardware maintenance procedures, operational runbooks, and troubleshooting standards.

Requirements

  • 8+ years of experience supporting large-scale datacenter hardware infrastructure.
  • Deep expertise with server platforms, GPU systems, storage infrastructure, and rack integration.
  • Proven experience diagnosing complex hardware failures and leading repair efforts in production environments.
  • Strong understanding of hardware reliability engineering principles and fleet-health management.
  • Ability to partner effectively across engineering, operations, manufacturing, and vendor organizations.
  • Must be able to sit onsite in Abilene, Texas 5 days per week.

Nice to have

  • Experience supporting large-scale GPU clusters or AI/ML infrastructure environments.
  • Familiarity with fleet health systems, telemetry platforms, and hardware monitoring tools.
  • Experience with failure analysis methodologies such as FRACAS, RCCA, 5-Why, or FMEA.
  • Knowledge of Linux system administration and hardware validation workflows.
  • Industry certifications such as CompTIA Server+ or OEM hardware certifications.

Culture & Benefits

  • Opportunity to work on world-class AI infrastructure at scale.
  • Collaborative environment partnering with top-tier engineering and operations teams.
  • Commitment to safety, diversity, and inclusion in the workplace.
  • Focus on operational excellence and continuous improvement of infrastructure standards.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →