Назад
Company hidden
5 часов назад

Senior Site Reliability Engineer (SRE) (Europe)

Формат работы
remote (только Europe)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Estonia
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (SRE): Driving the design, implementation, and evolution of a Kubernetes-based platform in a multi-cloud environment (GCP/AWS) with an accent on high availability and zero-downtime operations. Focus on proactively applying AI-driven approaches to improve operational efficiency and automated bottleneck detection.

Location: Remote (Europe)

Company

hirify.global is looking for a Senior SRE Engineer to drive the design, implementation, and evolution of their Kubernetes-based platform in a multi-cloud environment (GCP/AWS).

What you will do

  • Lead the Platform Evolution: Design and operate the Kubernetes ecosystem (GKE, multi-cluster) with a focus on high availability and zero-downtime operations.
  • Build "Paved Roads": Own and evolve the PaaS strategy, using GitOps (ArgoCD) and CI/CD (GitLab) to empower domain teams to deploy independently.
  • Architect Reliability: Define and implement the observability strategy across metrics, logs, and tracing (Prometheus, VictoriaMetrics, OpenTelemetry).
  • Drive Infrastructure-as-Code: Lead the automation of the infrastructure using Terraform, ensuring all resources are standardized and version-controlled.
  • Own the Error Budget: Partner with engineering teams to establish and manage SLOs, SLAs, and incident management frameworks.
  • Design and participate in regular DR drills, implementing blue/green and active/passive strategies across regions to ensure service continuity.

Requirements

  • English level B2+ for effective cross-functional communication.
  • Strong hands-on experience managing Kubernetes (GKE preferred) in high-load, multi-cluster production environments.
  • Deep experience with GCP (AWS is a strong plus) and Terraform for large-scale infrastructure.
  • Solid experience with ArgoCD, GitLab CI, and the "Infrastructure as Code" philosophy.
  • Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale.
  • Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities.

Nice to have

  • Understanding of banking-grade standards like PCI DSS, GDPR, or ISO 27001.
  • Experience with Kafka (Confluent), RabbitMQ, or managing high-load Redis and PostgreSQL clusters.
  • Experience using AI tools to improve alerting, anomaly detection, or engineering efficiency.
  • Experience with Vault for secret management and credential rotation.

Culture & Benefits

  • Make a genuine impact on the product.
  • Work in the EU.
  • Become a stock options holder.
  • Receive unwavering support and care.
  • Work & Swim program.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →