Principal Engineer (HPC Operations)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Engineer (HPC Operations): Overseeing daily operations and support of high-performance computing clusters for large-scale AI and ML workloads with an accent on infrastructure stability, security, and resource optimization. Focus on managing complex Slurm and Kubernetes environments, automating deployments, and providing technical leadership to engineering teams.
Location: Abu Dhabi, United Arab Emirates
Company
is a leader in AI-powered cloud and digital infrastructure, providing sovereign AI capabilities and scalable high-performance compute solutions.
What you will do
- Oversee operational management of compute, storage, and networking for HPC clusters.
- Optimize HPC system performance, resource utilization, and scheduler efficiency using Slurm and Kubernetes.
- Act as the primary technical escalation point for L2 support teams to resolve complex incidents.
- Monitor system health using advanced frameworks like Prometheus, Grafana, and DCGM.
- Lead root cause analysis and implement continuous improvement initiatives for infrastructure.
- Mentor junior engineers and foster knowledge sharing across multidisciplinary teams.
Requirements
- Bachelor’s or Master’s degree in Computer Science or a related technical field.
- Minimum 8 years of experience in HPC operations, systems engineering, or DevOps.
- At least 2 years of experience in a leadership or ownership capacity.
- Advanced expertise in maintaining complex HPC hardware, software, and storage systems.
- Hands-on experience with Slurm and Kubernetes for AI/ML workloads.
- Proficiency in scripting and automation with Python, Bash, Ansible, and Terraform.
Culture & Benefits
- Focus on well-being and supporting the whole person.
- Collaborative environment valuing diversity and individual experience.
- Mutual respect and mindfulness as core components of company culture.
- Opportunity to work at the forefront of AI innovation in the Middle East.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →