Principal Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Site Reliability Engineer (Kubernetes/AWS/GCP): Shape the long-term strategy and architecture for cloud and on-premise infrastructure powering a high-demand sports betting and gaming platform, with an accent on Kubernetes reliability, scalability, and operational consistency. Focus on defining SLOs/error budgets, building automation-first infrastructure (Infrastructure as Code, GitOps, self-healing), leading major incidents and post-incident improvements, and mentoring senior engineers to elevate platform resilience and developer experience.
Location: Remote (US)
Company
is a publicly traded technology company powering sports betting and gaming.
What you will do
- Define and execute long-term strategy for the Kubernetes platform across Google Kubernetes Engine, Amazon Elastic Kubernetes Service, RKE2, and on-premise environments.
- Drive architectural decisions for cluster lifecycle management, networking, identity and access management, observability, autoscaling, capacity planning, and cost optimization.
- Lead large-scale platform initiatives across multiple engineering teams, setting technical direction, standards, and measurable reliability outcomes.
- Establish and evolve reliability practices using SLOs, SLIs, and error budget frameworks aligned to business priorities.
- Build automation-first infrastructure with Infrastructure as Code, GitOps workflows, self-healing systems, and internal platform tooling.
- Lead critical platform incidents and drive post-incident improvements to strengthen resilience; mentor senior engineers through architecture reviews and coaching.
Requirements
- Location: Must be based in the United States (Remote - US)
- Bachelor’s degree in Computer Science or a related technical field.
- At least 8 years of experience designing, operating, and scaling distributed cloud and on-premise infrastructure, including at least 3 years at Staff/Principal (or equivalent) technical leadership level.
- Proven experience leading large-scale infrastructure or platform initiatives with cross-functional alignment and long-term technical ownership.
- Deep expertise with Kubernetes (cluster architecture, networking, storage, security, operators, lifecycle management) and large-scale production operations.
- Extensive production infrastructure experience on AWS and Google Cloud Platform using Infrastructure as Code (e.g., Terraform, Pulumi), plus strong software development experience in Go and/or Python.
Culture & Benefits
- Opportunity to shape infrastructure strategy for one of the most demanding sports betting and gaming platforms.
- Automation-first approach to improve engineering velocity and reduce operational overhead.
- Responsible adoption of AI-powered engineering capabilities to improve operational efficiency and incident response.
- Mentorship and technical leadership through architecture reviews, coaching, and measurable reliability outcomes.
- Equal-opportunity employer; support through the licensing process if required by state gaming regulations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →