Senior Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Kubernetes/AWS): Establishing and maintaining the infrastructure and operational systems for Thunderbird, with an accent on EKS-based Kubernetes operations, CI/CD reliability, and production incident diagnosis. Focus on infrastructure-as-code with Pulumi and/or Terraform/OpenTofu, observability stack improvements, and security-conscious AWS practices including least-privilege IAM and secrets management.
Location: Remote (Canada)
Salary: $108,000 - 125,000 CAD
Company
builds privacy-respecting communication tools through the Thunderbird team.
What you will do
- Operate and evolve an EKS-based Kubernetes platform, supporting service migrations and reliability initiatives.
- Design and develop CI/CD systems for websites, services, and Thunderbird desktop release workflows, including OIDC authentication in GitHub Actions.
- Write and maintain infrastructure using Pulumi and/or Terraform/OpenTofu across multiple AWS accounts.
- Operate and improve observability using VictoriaMetrics, VictoriaLogs, Grafana, and Vector; partner with engineering teams on instrumentation and monitoring.
- Apply security-conscious infrastructure practices (least-privilege IAM, secrets management via AWS Secrets Manager and External Secrets Operator, and network segmentation).
- Diagnose production incidents, run root-cause analysis, drive post-incident improvements, and contribute runbooks and architecture documentation.
Requirements
- 7+ years of experience in infrastructure, platform engineering, or site reliability, including hands-on production Kubernetes operations and cluster management.
- Hands-on infrastructure-as-code experience on AWS using Terraform, OpenTofu, or Pulumi.
- Security awareness in day-to-day infrastructure work: identity, least privilege, secrets hygiene, and network controls.
- Excellent async written communication skills and comfort working with a geographically distributed team.
- Ability to collaborate effectively with software engineers and non-engineering stakeholders to improve reliability and operational efficiency.
- Must reside in and have permanent work authorization for Canada; no visa sponsorship.
Nice to have
- Experience with GitOps workflows (ArgoCD or Flux).
- Familiarity with Keycloak or similar identity platforms (OIDC, SAML, federation).
- Knowledge of email protocols and/or experience operating email infrastructure (SMTP, IMAP).
- Prior work in or alongside an open-source community.
- French, German, Japanese, or other language proficiency in addition to English.
Culture & Benefits
- Full-time fully remote role with schedule flexibility and a distributed team across time zones.
- Company-provided laptop, monthly remote work stipend, and annual professional development stipend.
- Annual bonus program, industry conferences, and company all-hands/team gatherings.
- 24 days PTO per year (prorated), year-end company shutdown, wellbeing days, and public holidays.
- Health, dental, and vision insurance; RRSP contributions; disability and life insurance; paid parental leave and paid sick days.
Hiring process
- Submit application with cover letter and screening questions; responses are reviewed carefully.
- Interviews and evaluations focus on production reliability thinking, security awareness, and collaboration in a distributed environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →