Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Site Reliability Engineer: Building and optimizing multi-regional resilient infrastructure for a global gaming platform with an accent on hybrid cloud migration, high availability, and system scalability. Focus on designing automated deployment pipelines, managing on-call incident response, and ensuring performance stability for millions of concurrent users.
Company
.com is the world's leading platform for playing, learning, and enjoying , serving over 250 million players globally with a mission-driven, fully remote, and flat organizational culture.
What you will do
- Design and implement multi-regional resilient infrastructure to support millions of daily concurrent sessions.
- Lead the hybrid cloud migration strategy, integrating bare-metal data centers with cloud services.
- Own on-call rotation and incident response procedures to maintain high availability SLAs.
- Architect monitoring and alerting systems to proactively identify and resolve performance bottlenecks.
- Collaborate with development teams to implement infrastructure-as-code and continuous delivery pipelines.
- Drive automation initiatives to reduce manual operational overhead and improve system reliability.
Requirements
- 5+ years of experience in site reliability engineering, DevOps, or infrastructure engineering.
- Strong proficiency with UNIX/Linux operating systems and command-line administration.
- Experience managing bare-metal server infrastructure and data center operations.
- Hands-on experience with cloud platforms (GCP, AWS, or Azure) and infrastructure-as-code tools like Terraform.
- Solid understanding of networking fundamentals, protocols, and network troubleshooting.
- Experience with containerization and orchestration technologies such as Docker and Kubernetes.
Nice to have
- Advanced knowledge of CDNs and edge computing.
- Experience with scripting languages like Python, Go, or Bash.
- Background in high-availability architectures and disaster recovery planning.
- Experience with game server infrastructure or real-time application hosting.
- Previous experience in a fully remote, distributed work environment.
Culture & Benefits
- 100% remote work environment with a team distributed across 60+ countries.
- Mission-driven, flat, and life-celebrating company culture.
- Opportunity to work on a massive-scale platform serving 250M+ users.
- Focus on technical infrastructure decisions that directly impact global gaming experiences.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →