Data Center Operations Engineer (Linux/GPU)

Формат работы

onsite

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Data Center Operations Engineer (Linux/GPU): Maintaining and deploying critical data center infrastructure with an accent on Linux-based systems, GPU server clusters, and InfiniBand networking. Focus on hardware installation, cluster bring-up, and troubleshooting compute and network environments.

Location: On-site in San Jose, USA

Company

hirify.global is a leading company providing electronic design automation (EDA) software and hardware tools to enable the design of integrated circuits.

What you will do

Provide hands-on operational support for all data center projects, deployments, and repair activities.
Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, and storage infrastructure.
Perform InfiniBand fabric bring-up, switch configuration, subnet management, and troubleshooting.
Install, configure, and maintain server hardware, including rack and stack, cabling, and component replacement.
Coordinate with vendors and global teams for hardware delivery, diagnostics, and warranty services.
Maintain accurate operational documentation, system configurations, and technical runbooks.

Requirements

Bachelor’s degree in Computer Science, Engineering, IT, or equivalent practical experience.
Strong hands-on experience in Linux administration, troubleshooting, and shell scripting (Bash).
Experience with cluster bring-up and validating GPU servers in clustered environments.
Working knowledge of InfiniBand networking and the TCP/IP protocol suite.
Experience installing and troubleshooting routers, switches, and terminal servers.
Ability to lift 50+ lbs and work in physical data center environments (raised floors, equipment racks).

Nice to have

Experience supporting HPC, AI, or large-scale GPU environments.
Exposure to data center monitoring and alerting frameworks.
Familiarity with large-scale data center buildouts or refresh programs.

Culture & Benefits

Opportunity to work with cutting-edge AI and GPU infrastructure.
Fast-paced operational setting focused on rapid problem resolution and reliability.
Collaboration with cross-functional global teams across different time zones.
Structured incident management and escalation procedures.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →