2 дня назад
AI Infrastructure Engineer (HPC)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
AI Infrastructure Engineer (AI/HPC): Architecting and maintaining large-scale AI compute environments in high-density data centers with an accent on GPU cluster orchestration and high-performance networking. Focus on optimizing InfiniBand/RoCE v2, implementing high-throughput storage, and automating infrastructure via IaC.
Location: Tydal, Norway (Onsite)
Company
is a world-leading technology company specializing in Bitcoin mining solutions and AI cloud infrastructure globally.
What you will do
- Deploy and manage large-scale GPU clusters using orchestration platforms such as Kubernetes or Slurm.
- Optimize high-speed, low-latency networking, including InfiniBand and RoCE v2, for distributed compute.
- Plan and monitor rack density across AI infrastructure in collaboration with Project and Operations teams.
- Implement and maintain high-throughput storage systems such as Lustre, BeeGFS, and WekaIO.
- Automate infrastructure provisioning and configuration using Terraform or Ansible.
- Troubleshoot and optimize performance across compute, networking, and storage in mission-critical environments.
Requirements
- Degree in Computer Science, Data Engineering, or a related technical field (Master’s preferred).
- Strong experience with Linux administration, containerization, and GPU infrastructure.
- Experience with workload management platforms such as Kubernetes or Slurm.
- Familiarity with the NVIDIA infrastructure stack (CUDA, NCCL, Triton Inference Server).
- Deep understanding of NVIDIA GB300 and VR NVL72 Scalable Units’ functionality.
- Experience with high-performance networking (InfiniBand, RoCE v2) and storage platforms (Lustre, BeeGFS, WekaIO).
Nice to have
- NVIDIA, Kubernetes, or cloud infrastructure certifications.
Culture & Benefits
- Commitment to equal employment opportunities regardless of race, gender, religion, or background.
- Opportunity to work with cutting-edge NVIDIA hardware and high-density AI data center technology.
- Global operational environment spanning multiple countries including the US, Canada, and Norway.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →