2 дня назад
Principal Site Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
Principal Site Reliability Engineer (AI): Architecting and leading the evolution of globally distributed infrastructure for AI and private cloud workloads with an accent on advanced automation and AIOps. Focus on building scalable, resilient, self-healing platforms and optimizing Kubernetes for GPU-intensive workloads.
Location: Abu Dhabi, United Arab Emirates
Company
is a leader in AI-powered cloud and digital infrastructure driving transformative technology solutions globally.
What you will do
- Define the long-term roadmap for infrastructure, CI/CD, and Kubernetes platforms.
- Design and implement AI-driven automation and self-healing systems for incident remediation and capacity optimization.
- Architect high-performance Kubernetes environments optimized for multi-tenancy and GPU-intensive workloads.
- Build observability platforms integrating metrics, logs, and tracing, and define SLOs/SLIs aligned with business outcomes.
- Mentor SRE and DevOps teams while leading architectural reviews and internal Centers of Excellence.
- Partner with product and engineering teams to balance innovation with reliability.
Requirements
- 10+ years of experience in Site Reliability Engineering, Platform Engineering, or Systems Architecture.
- Proven experience designing and operating large-scale distributed systems.
- Deep expertise in Kubernetes environments (EKS, GKE, or bare metal), including GPU workloads.
- Strong programming skills in Python, Go, or Rust.
- Extensive experience with Terraform, Helm, and infrastructure-as-code practices.
- Strong understanding of observability systems (metrics, logging, tracing).
Nice to have
- Experience with AI/ML infrastructure, including model serving and data pipelines.
- Familiarity with scheduling frameworks such as Ray, Kueue, or Volcano.
- Experience building automation or AI-driven operational tools.
- Certifications such as CKA, AWS/Azure Solutions Architect.
Culture & Benefits
- Competitive salary package and performance-based annual bonus.
- Premium family health insurance, including dental, vision, and life insurance.
- Unlimited access to top-tier learning platforms for professional growth.
- Exclusive discount cards (Esaad and Fazaa) across a wide range of services.
- Inclusive and collaborative environment with a diverse team of 1,100+ employees from 68 nationalities.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →