Senior Site Reliability Engineer (SRE)

Формат работы

remote (только Europe)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Spain/Cyprus/Kazakhstan +1 еще

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (SRE): Driving the design, implementation, and evolution of a Kubernetes-based platform in a multi-cloud environment (GCP/AWS) with an accent on high availability and zero-downtime operations. Focus on proactively applying AI-driven approaches to improve operational efficiency and automated bottleneck detection.

Location: Remote (Europe)

Company

hirify.global is looking for a Senior SRE Engineer to drive the design, implementation, and evolution of our Kubernetes-based platform in a multi-cloud environment (GCP/AWS).

What you will do

Lead the platform evolution by designing and operating our Kubernetes ecosystem (GKE, multi-cluster) with a focus on high availability and zero-downtime operations.
Build "Paved Roads" by owning and evolving our PaaS strategy, using GitOps (ArgoCD) and CI/CD (GitLab) to empower domain teams to deploy independently.
Architect reliability by defining and implementing our observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).
Drive infrastructure-as-code by leading the automation of our infrastructure using Terraform, ensuring all resources are standardized and version-controlled.
Own the error budget by partnering with engineering teams to establish and manage SLOs, SLAs, and incident management frameworks.
Design and participate in regular DR drills, implementing blue/green and active/passive strategies across regions to ensure service continuity.

Requirements

Strong hands-on experience managing Kubernetes (GKE preferred) in high-load, multi-cluster production environments.
Deep experience with GCP (AWS is a strong plus) and Terraform for large-scale infrastructure.
Solid experience with ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy.
Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale.
Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities.
English level B2+ for effective cross-functional communication.

Nice to have

Understanding of banking-grade standards like PCI DSS, GDPR, or ISO 27001.
Experience with Kafka (Confluent), RabbitMQ, or managing high-load Redis and PostgreSQL clusters.
Experience using AI tools to improve alerting, anomaly detection, or engineering efficiency.
Experience with Vault for secret management and credential rotation.

Culture & Benefits

Make a genuine impact on the product.
Work in the EU.
Become a stock options holder.
Receive unwavering support and care.
Work & Swim program.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Senior Site Reliability Engineer (SRE)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

SRE Engineer (Fintech)

Site Reliability Engineer III (Cloud)

Senior SRE (Web3)

Site Reliability Engineer (SRE) (Kubernetes)

Platform Software Engineer (DevOps)

Senior Platform Engineer - SRE (Cybersecurity)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business