Назад
Company hidden
12 часов назад

Senior Site Reliability Engineer (GCP)

Формат работы
remote (только Europe)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Serbia/Armenia/Bulgaria
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (GCP/Kubernetes): Designing and evolving a multi-cloud Kubernetes-based platform with an accent on high availability, zero-downtime operations, and platform architecture. Focus on implementing GitOps workflows, automating infrastructure via Terraform, and establishing advanced observability and disaster recovery strategies.

Location: Remote (Serbia)

Company

hirify.global is a fintech company providing a modern financial platform for businesses.

What you will do

  • Design and operate a multi-cluster Kubernetes ecosystem (GKE) focused on high availability and zero-downtime.
  • Evolve the PaaS strategy using GitOps (ArgoCD) and CI/CD (GitLab) to enable independent domain team deployments.
  • Implement a comprehensive observability strategy across metrics, logs, and tracing using Prometheus, VictoriaMetrics, and OpenTelemetry.
  • Lead infrastructure automation using Terraform to ensure standardized and version-controlled resources.
  • Establish and manage SLOs, SLAs, and incident management frameworks in collaboration with engineering teams.
  • Design and execute disaster recovery drills using blue/green and active/passive strategies across regions.

Requirements

  • Strong hands-on experience managing Kubernetes (GKE) in high-load, multi-cluster production environments.
  • Deep expertise with GCP and Terraform for large-scale infrastructure automation.
  • Solid experience with ArgoCD, GitLab CI, and the Infrastructure as Code philosophy.
  • Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale.
  • Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities.
  • English: B2+ level required for effective cross-functional communication.

Nice to have

  • Understanding of banking-grade standards such as PCI DSS, GDPR, or ISO 27001.
  • Experience managing Kafka, RabbitMQ, and high-load Redis or PostgreSQL clusters.
  • Experience using AI tools to improve alerting, anomaly detection, or engineering efficiency.
  • Experience with HashiCorp Vault for secret management and credential rotation.

Culture & Benefits

  • Opportunity to make a genuine impact on the product development.
  • Possibility to work in the EU.
  • Equity through stock options.
  • Strong support system and company care.
  • Unique Work & Swim program.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →