Senior/Principal DevOps (GCP)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior/Principal DevOps (GCP): Own and evolve production infrastructure for a SaaS platform serving game developers with an accent on high availability, scalability for bursty high-load traffic, and deep observability. Focus on designing autoscaling strategies, leading incident response and postmortems, building IaC with Terraform, and operating Kubernetes on GKE.
Location: On-site in Lisbon, Portugal
Company
helps game developers achieve financial and creative independence by providing the solutions they need to launch, run, and grow their businesses.
What you will do
- Own and evolve cloud infrastructure on GCP and Cloudflare, maintaining high availability for SaaS serving B2B and B2C.
- Design scaling strategies for 10–50× load spikes across compute, networking, and data layers.
- Account for SLA/SLO outcomes: lead incident detection, mitigation, postmortems, and root cause fixes.
- Build and maintain IaC with Terraform/Terragrunt, own GKE operations, Helm charts, and manifests.
- Implement end-to-end observability in Datadog: dashboards, monitors, alerts for metrics/logs/APM.
- Configure DevSecOps tooling, triage security findings, and optimize CI/CD in GitHub Actions.
- Manage costs with FinOps: visibility, right-sizing, and waste reduction.
Requirements
- 5+ years production DevOps/SRE experience with impact on high-load SaaS and SLA commitments.
- Hands-on GCP expertise, especially GKE and services like Cloud SQL, BigQuery, Pub/Sub.
- Strong IaC with Terraform; Kubernetes operations at scale.
- Cloudflare experience; Datadog for observability.
- Scripting/automation skills and reliability mindset.
- On-site in Lisbon required.
Nice to have
- Game dev or bursty high-load consumer products experience.
- SOC 2/PCI-DSS audits; service mesh like Cloud Service Mesh.
- Mature SRE practices: error budgets, on-call, runbooks.
Culture & Benefits
- Small team (15–20 engineers) with high autonomy and fast decision-making.
- Cloud-only infrastructure with real ownership on stability, scaling, costs.
- Shape SRE culture, tooling, and standards in fast-growing startup.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →