Platform Engineer (Reliability)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Platform Engineer (Reliability) (SRE/Platform): Shape reliability and operational excellence engineering practices to maintain high system uptime with an accent on automation, runbooks, and incident learning. Focus on performance testing, capacity planning, observability, and embedding security/compliance into engineering platform and delivery pipelines.
Location: Remote or hybrid in Spain, United Kingdom, Ireland, or Portugal
Company
is a global interactive entertainment company developing and live-operating mobile games.
What you will do
- Define reliability and operational excellence practices to reduce operational toil via automation, clear ownership, and well-defined runbooks.
- Drive performance testing, tuning, and capacity planning to meet SLAs while balancing cost, scalability, and reliability.
- Eliminate manual processes by building automation and software-driven solutions across services and codebases.
- Debug and resolve reliability and performance issues in production, contributing code changes to improve system behavior.
- Embed security, compliance, and governance into engineering platform and delivery pipelines by default.
- Design and operate observability (metrics, logs, traces), participate in incident response and postmortems, and optimize cloud cost/efficiency.
Requirements
- Strong software engineering background with experience applying SRE or platform practices for reliability, scalability, and performance.
- Experience owning/operating production systems, including incident response, troubleshooting, and improvement based on operational learnings.
- Ability to take ownership of complex production systems and improve them over time with minimal supervision.
- Experience debugging complex distributed systems across service boundaries and improving reliability/observability.
- Infrastructure as Code experience with Terraform (or similar IaC tools).
- Experience operating containerized workloads on cloud-native platforms (Kubernetes/EKS, ECS, or equivalent) and familiarity with AWS Well-Architected (or equivalent).
Nice to have
- Experience mentoring engineers and influencing best practices across teams.
- Observability experience with Datadog.
- Exposure to cost optimization, capacity planning, and cloud governance at scale.
Culture & Benefits
- Remote or hybrid work arrangement based in Spain, United Kingdom, Ireland, or Portugal.
- Visa sponsorship and relocation assistance available from any location.
- Focus on continuous improvement through incident learning and systemic fixes.
- Cross-functional collaboration with Engineering and Product to balance delivery speed, reliability, and long-term sustainability.
Hiring process
- Interviews to assess reliability/platform engineering experience and production ownership.
- Technical evaluation focused on reliability, observability, automation, and incident/problem-solving approach.
- Final discussions to confirm fit and collaboration style.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →