Назад
Company hidden
2 дня назад

Principal Site Reliability Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UAE
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Site Reliability Engineer (AI): Architecting and leading the evolution of globally distributed infrastructure for AI and private cloud workloads with an accent on advanced automation and AIOps. Focus on building scalable, resilient, self-healing platforms and optimizing Kubernetes for GPU-intensive workloads.

Location: Abu Dhabi, United Arab Emirates

Company

hirify.global is a leader in AI-powered cloud and digital infrastructure driving transformative technology solutions globally.

What you will do

  • Define the long-term roadmap for infrastructure, CI/CD, and Kubernetes platforms.
  • Design and implement AI-driven automation and self-healing systems for incident remediation and capacity optimization.
  • Architect high-performance Kubernetes environments optimized for multi-tenancy and GPU-intensive workloads.
  • Build observability platforms integrating metrics, logs, and tracing, and define SLOs/SLIs aligned with business outcomes.
  • Mentor SRE and DevOps teams while leading architectural reviews and internal Centers of Excellence.
  • Partner with product and engineering teams to balance innovation with reliability.

Requirements

  • 10+ years of experience in Site Reliability Engineering, Platform Engineering, or Systems Architecture.
  • Proven experience designing and operating large-scale distributed systems.
  • Deep expertise in Kubernetes environments (EKS, GKE, or bare metal), including GPU workloads.
  • Strong programming skills in Python, Go, or Rust.
  • Extensive experience with Terraform, Helm, and infrastructure-as-code practices.
  • Strong understanding of observability systems (metrics, logging, tracing).

Nice to have

  • Experience with AI/ML infrastructure, including model serving and data pipelines.
  • Familiarity with scheduling frameworks such as Ray, Kueue, or Volcano.
  • Experience building automation or AI-driven operational tools.
  • Certifications such as CKA, AWS/Azure Solutions Architect.

Culture & Benefits

  • Competitive salary package and performance-based annual bonus.
  • Premium family health insurance, including dental, vision, and life insurance.
  • Unlimited access to top-tier learning platforms for professional growth.
  • Exclusive discount cards (Esaad and Fazaa) across a wide range of services.
  • Inclusive and collaborative environment with a diverse team of 1,100+ employees from 68 nationalities.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →