HPC Cluster Architect
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
HPC Cluster Architect (NVIDIA GPU/HPC): Own end-to-end cluster architecture for large-scale dedicated GPU deployments from customer requirements to production handover with an accent on compute, networking, storage, and physical design. Focus on designing high-performance network fabrics, engaging with vendors, and providing technical oversight during deployment and validation.
Location: UK (Remote)
Company
is behind Hyperstack, a full-stack AI cloud delivering on-demand and private GPU infrastructure to AI researchers and enterprises running compute-intensive workloads.
What you will do
- Own end-to-end cluster architecture for large-scale NVIDIA GPU deployments, including rack layouts, BOM, power, cooling, and production handover
- Design high-performance network fabrics across compute (InfiniBand, RDMA, NVLink/NVSwitch), storage, and WAN with topology and scaling strategies
- Engage directly with OEMs and vendors to validate hardware, review quotes, and optimize designs commercially
- Provide technical oversight during deployment, hardware validation, performance testing, and issue escalation
- Act as senior technical leader across Solutions Architecture, Cloud Engineering, and data centre partners to build standardized reference designs
Requirements
- Proven experience designing and delivering GPU-based HPC or AI clusters at scale across full lifecycle
- Deep hands-on knowledge of NVIDIA GPU platforms (H100/H200/B-series) and reference architectures
- Strong InfiniBand/RDMA design experience including topology and performance tuning
- Solid grounding in Linux systems, PCIe topology, NUMA alignment, and server performance
- Background from OEM, hyperscaler, neo-cloud, or HPC environment with full design-to-deployment exposure
- Confident engaging with customers, vendors, and teams as technical authority
Nice to have
- Experience with Spectrum-X or next-generation Ethernet fabrics
- Prior involvement in 1,000+ GPU cluster deployments and benchmarking (NCCL, MLPerf)
- Exposure to air-cooled/liquid-cooled HPC and infrastructure-as-code
Culture & Benefits
- Competitive salary and annual discretionary bonus
- Employee wellbeing benefits and 25 days holiday plus public holidays
- Flexible working (remote or hybrid depending on role and location)
- Real ownership, autonomy, and career progression in a fast-growing company
- Collaborative international culture with trust, transparency, and mission-driven colleagues
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →