Назад
Company hidden
3 дня назад

Data Center Operations Engineer (Linux/GPU)

Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Data Center Operations Engineer (Linux/GPU): Maintaining and deploying critical data center infrastructure with an accent on Linux-based systems, GPU server clusters, and InfiniBand networking. Focus on hardware installation, cluster bring-up, and troubleshooting compute and network environments.

Location: On-site in San Jose, USA

Company

hirify.global is a leading company providing electronic design automation (EDA) software and hardware tools to enable the design of integrated circuits.

What you will do

  • Provide hands-on operational support for all data center projects, deployments, and repair activities.
  • Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, and storage infrastructure.
  • Perform InfiniBand fabric bring-up, switch configuration, subnet management, and troubleshooting.
  • Install, configure, and maintain server hardware, including rack and stack, cabling, and component replacement.
  • Coordinate with vendors and global teams for hardware delivery, diagnostics, and warranty services.
  • Maintain accurate operational documentation, system configurations, and technical runbooks.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, IT, or equivalent practical experience.
  • Strong hands-on experience in Linux administration, troubleshooting, and shell scripting (Bash).
  • Experience with cluster bring-up and validating GPU servers in clustered environments.
  • Working knowledge of InfiniBand networking and the TCP/IP protocol suite.
  • Experience installing and troubleshooting routers, switches, and terminal servers.
  • Ability to lift 50+ lbs and work in physical data center environments (raised floors, equipment racks).

Nice to have

  • Experience supporting HPC, AI, or large-scale GPU environments.
  • Exposure to data center monitoring and alerting frameworks.
  • Familiarity with large-scale data center buildouts or refresh programs.

Culture & Benefits

  • Opportunity to work with cutting-edge AI and GPU infrastructure.
  • Fast-paced operational setting focused on rapid problem resolution and reliability.
  • Collaboration with cross-functional global teams across different time zones.
  • Structured incident management and escalation procedures.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →