Senior Site Reliability Engineer (SRE) (Europe)

Формат работы

remote (только Europe)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Estonia

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (SRE): Driving the design, implementation, and evolution of a Kubernetes-based platform in a multi-cloud environment (GCP/AWS) with an accent on high availability and zero-downtime operations. Focus on proactively applying AI-driven approaches to improve operational efficiency and automated bottleneck detection.

Location: Remote (Europe)

Company

hirify.global is looking for a Senior SRE Engineer to drive the design, implementation, and evolution of their Kubernetes-based platform in a multi-cloud environment (GCP/AWS).

What you will do

Lead the Platform Evolution: Design and operate the Kubernetes ecosystem (GKE, multi-cluster) with a focus on high availability and zero-downtime operations.
Build "Paved Roads": Own and evolve the PaaS strategy, using GitOps (ArgoCD) and CI/CD (GitLab) to empower domain teams to deploy independently.
Architect Reliability: Define and implement the observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).
Drive Infrastructure-as-Code: Lead the automation of the infrastructure using Terraform, ensuring all resources are standardized and version-controlled.
Own the Error Budget: Partner with engineering teams to establish and manage SLOs, SLAs, and incident management frameworks.
Design and participate in regular DR drills, implementing blue/green and active/passive strategies across regions to ensure service continuity.

Requirements

English level B2+ for effective cross-functional communication.
Strong hands-on experience managing Kubernetes (GKE preferred) in high-load, multi-cluster production environments.
Deep experience with GCP (AWS is a strong plus) and Terraform for large-scale infrastructure.
Solid experience with ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy.
Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale.
Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities.

Nice to have

Understanding of banking-grade standards like PCI DSS, GDPR, or ISO 27001.
Experience with Kafka (Confluent), RabbitMQ, or managing high-load Redis and PostgreSQL clusters.
Experience using AI tools to improve alerting, anomaly detection, or engineering efficiency.
Experience with Vault for secret management and credential rotation.

Culture & Benefits

Make a genuine impact on the product.
Work in the EU.
Become a stock options holder.
Receive unwavering support and care.
Work & Swim program.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Senior Site Reliability Engineer (SRE) (Europe)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Senior Site Reliability Engineer (GCP)

Site Reliability Engineer III (Cloud)

Middle DevOps Engineer

Senior SRE (Web3)

Senior DevOps Engineer (Fintech)

Senior DevOps Engineer (ID KYC)

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business