3 дня назад
Data Center Operations Engineer (Linux/GPU)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
Data Center Operations Engineer (Linux/GPU): Maintaining and deploying critical data center infrastructure with an accent on Linux-based systems, GPU server clusters, and InfiniBand networking. Focus on hardware installation, cluster bring-up, and troubleshooting compute and network environments.
Location: On-site in San Jose, USA
Company
is a leading company providing electronic design automation (EDA) software and hardware tools to enable the design of integrated circuits.
What you will do
- Provide hands-on operational support for all data center projects, deployments, and repair activities.
- Troubleshoot and resolve operational issues related to Linux servers, GPU platforms, and storage infrastructure.
- Perform InfiniBand fabric bring-up, switch configuration, subnet management, and troubleshooting.
- Install, configure, and maintain server hardware, including rack and stack, cabling, and component replacement.
- Coordinate with vendors and global teams for hardware delivery, diagnostics, and warranty services.
- Maintain accurate operational documentation, system configurations, and technical runbooks.
Requirements
- Bachelor’s degree in Computer Science, Engineering, IT, or equivalent practical experience.
- Strong hands-on experience in Linux administration, troubleshooting, and shell scripting (Bash).
- Experience with cluster bring-up and validating GPU servers in clustered environments.
- Working knowledge of InfiniBand networking and the TCP/IP protocol suite.
- Experience installing and troubleshooting routers, switches, and terminal servers.
- Ability to lift 50+ lbs and work in physical data center environments (raised floors, equipment racks).
Nice to have
- Experience supporting HPC, AI, or large-scale GPU environments.
- Exposure to data center monitoring and alerting frameworks.
- Familiarity with large-scale data center buildouts or refresh programs.
Culture & Benefits
- Opportunity to work with cutting-edge AI and GPU infrastructure.
- Fast-paced operational setting focused on rapid problem resolution and reliability.
- Collaboration with cross-functional global teams across different time zones.
- Structured incident management and escalation procedures.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Похожие вакансии
2 дня назад
Operations Engineer (HPC Networking)
110 000 - 179 000$
2 дня назад
HPC Operations Lead (AI)
4 дня назад
Technical Operations Engineer
4 дня назад
Technical Operations Engineer
140 000 - 190 000$
3 дня назад
HPC Infrastructure Planner Lead (Fintech)
125 000 - 150 000$
1 день назад