HPC Cluster Architect

Формат работы

remote (только United_kingdom)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

HPC Cluster Architect (NVIDIA GPU/HPC): Own end-to-end cluster architecture for large-scale dedicated GPU deployments from customer requirements to production handover with an accent on compute, networking, storage, and physical design. Focus on designing high-performance network fabrics, engaging with vendors, and providing technical oversight during deployment and validation.

Location: UK (Remote)

Company

hirify.global is behind Hyperstack, a full-stack AI cloud delivering on-demand and private GPU infrastructure to AI researchers and enterprises running compute-intensive workloads.

What you will do

Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments, including rack layouts, BOM, power, cooling, and production handover
Design high-performance network fabrics across compute (InfiniBand, RDMA, NVLink/NVSwitch), storage, and WAN with topology and scaling strategies
Engage directly with OEMs and vendors to validate hardware, review quotes, and optimize designs commercially
Provide technical oversight during deployment, hardware validation, performance testing, and issue escalation
Act as senior technical leader across Solutions Architecture, Cloud Engineering, and data centre partners to build standardized reference designs

Requirements

Proven experience designing and delivering GPU-based HPC or AI clusters at scale across full lifecycle
Deep hands-on knowledge of NVIDIA GPU platforms (H100/H200/B-series) and reference architectures
Strong InfiniBand/RDMA design experience including topology and performance tuning
Solid grounding in Linux systems, PCIe topology, NUMA alignment, and server performance
Background from OEM, hyperscaler, neo-cloud, or HPC environment with full design-to-deployment exposure
Confident engaging with customers, vendors, and teams as technical authority

Nice to have

Experience with Spectrum-X or next-generation Ethernet fabrics
Prior involvement in 1,000+ GPU cluster deployments and benchmarking (NCCL, MLPerf)
Exposure to air-cooled/liquid-cooled HPC and infrastructure-as-code

Culture & Benefits

Competitive salary and annual discretionary bonus
Employee wellbeing benefits and 25 days holiday plus public holidays
Flexible working (remote or hybrid depending on role and location)
Real ownership, autonomy, and career progression in a fast-growing company
Collaborative international culture with trust, transparency, and mission-driven colleagues

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →