Principal Site Reliability Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

UAE

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Principal Site Reliability Engineer (AI): Architecting and leading the evolution of globally distributed infrastructure for AI and private cloud workloads with an accent on advanced automation and AIOps. Focus on building scalable, resilient, self-healing platforms and optimizing Kubernetes for GPU-intensive workloads.

Location: Abu Dhabi, United Arab Emirates

Company

hirify.global is a leader in AI-powered cloud and digital infrastructure driving transformative technology solutions globally.

What you will do

Define the long-term roadmap for infrastructure, CI/CD, and Kubernetes platforms.
Design and implement AI-driven automation and self-healing systems for incident remediation and capacity optimization.
Architect high-performance Kubernetes environments optimized for multi-tenancy and GPU-intensive workloads.
Build observability platforms integrating metrics, logs, and tracing, and define SLOs/SLIs aligned with business outcomes.
Mentor SRE and DevOps teams while leading architectural reviews and internal Centers of Excellence.
Partner with product and engineering teams to balance innovation with reliability.

Requirements

10+ years of experience in Site Reliability Engineering, Platform Engineering, or Systems Architecture.
Proven experience designing and operating large-scale distributed systems.
Deep expertise in Kubernetes environments (EKS, GKE, or bare metal), including GPU workloads.
Strong programming skills in Python, Go, or Rust.
Extensive experience with Terraform, Helm, and infrastructure-as-code practices.
Strong understanding of observability systems (metrics, logging, tracing).

Nice to have

Experience with AI/ML infrastructure, including model serving and data pipelines.
Familiarity with scheduling frameworks such as Ray, Kueue, or Volcano.
Experience building automation or AI-driven operational tools.
Certifications such as CKA, AWS/Azure Solutions Architect.

Culture & Benefits

Competitive salary package and performance-based annual bonus.
Premium family health insurance, including dental, vision, and life insurance.
Unlimited access to top-tier learning platforms for professional growth.
Exclusive discount cards (Esaad and Fazaa) across a wide range of services.
Inclusive and collaborative environment with a diverse team of 1,100+ employees from 68 nationalities.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →