12 часов назад
Senior Site Reliability Engineer (GCP)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
Senior Site Reliability Engineer (GCP/Kubernetes): Designing and evolving a multi-cloud Kubernetes-based platform with an accent on high availability, zero-downtime operations, and platform architecture. Focus on implementing GitOps workflows, automating infrastructure via Terraform, and establishing advanced observability and disaster recovery strategies.
Location: Remote (Serbia)
Company
is a fintech company providing a modern financial platform for businesses.
What you will do
- Design and operate a multi-cluster Kubernetes ecosystem (GKE) focused on high availability and zero-downtime.
- Evolve the PaaS strategy using GitOps (ArgoCD) and CI/CD (GitLab) to enable independent domain team deployments.
- Implement a comprehensive observability strategy across metrics, logs, and tracing using Prometheus, VictoriaMetrics, and OpenTelemetry.
- Lead infrastructure automation using Terraform to ensure standardized and version-controlled resources.
- Establish and manage SLOs, SLAs, and incident management frameworks in collaboration with engineering teams.
- Design and execute disaster recovery drills using blue/green and active/passive strategies across regions.
Requirements
- Strong hands-on experience managing Kubernetes (GKE) in high-load, multi-cluster production environments.
- Deep expertise with GCP and Terraform for large-scale infrastructure automation.
- Solid experience with ArgoCD, GitLab CI, and the Infrastructure as Code philosophy.
- Deep knowledge of the Prometheus/Grafana stack and implementing tracing/logging at scale.
- Proven ability to design highly available 24/7 systems with automated failover and rollback capabilities.
- English: B2+ level required for effective cross-functional communication.
Nice to have
- Understanding of banking-grade standards such as PCI DSS, GDPR, or ISO 27001.
- Experience managing Kafka, RabbitMQ, and high-load Redis or PostgreSQL clusters.
- Experience using AI tools to improve alerting, anomaly detection, or engineering efficiency.
- Experience with HashiCorp Vault for secret management and credential rotation.
Culture & Benefits
- Opportunity to make a genuine impact on the product development.
- Possibility to work in the EU.
- Equity through stock options.
- Strong support system and company care.
- Unique Work & Swim program.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →