Senior Site Reliability Engineer (AWS)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (AWS/Kubernetes): Building and optimizing a cloud platform for game-based learning with an accent on Kubernetes orchestration, GitOps pipelines, and AWS infrastructure. Focus on scaling systems for millions of users, developing internal CLI tooling in Go/Python, and improving system observability.
Location: Greater Toronto Area, Ontario
Salary: CA$165K – CA$185K
Company
is a global leader in game-based learning, providing an EdTech platform used by millions of students and hundreds of thousands of teachers.
What you will do
- Own and modernize critical systems across EKS, ArgoCD, and AWS to ensure scalability for a growing user base.
- Develop and maintain high-quality, modular Infrastructure as Code using Terraform and Helm.
- Build production-grade internal tooling and automation using Go and Python to enhance the developer experience.
- Lead incident response and on-call rotations, transforming production issues into architectural improvements.
- Optimize Datadog instrumentation and profile Node.js workloads to eliminate performance bottlenecks.
Requirements
- 5+ years of experience in SRE, Platform, or Infrastructure roles managing production systems at scale.
- Deep expertise in Kubernetes internals, including debugging complex failures and manifest management via Helm or Kustomize.
- Advanced proficiency in AWS (IAM, Networking, EKS) and writing reusable Terraform modules.
- Ability to write clean, maintainable code in Go or Python for internal tooling.
- High bar for written communication, specifically for documentation and postmortems.
Nice to have
- Experience with GitOps workflows using ArgoCD.
- Hands-on experience profiling or optimizing Node.js/TypeScript services.
- Knowledge of Service Mesh architectures or Kubernetes Gateway API.
- Background in EdTech or high-concurrency consumer platforms.
Culture & Benefits
- Opportunity to work in a senior-heavy team where individual contributions have a significant company-wide impact.
- Access to a modern tech stack including ArgoCD, Kubernetes Gateway API, and Drata.
- Learning-oriented culture that prioritizes "correct over quick" and blameless postmortems.
- Mission-driven environment dedicated to transforming global education.
- Comprehensive Total Rewards Program focusing on financial, physical, and mental well-being.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →