Senior Site Reliability Engineer (AI Infrastructure)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (AI Infrastructure): Ensuring the reliability, security, and operational excellence of AI and ML platforms with an accent on observability, automation, and safe operations. Focus on building automation, designing guardrails, hardening platforms, and leading incident response across cloud environments like Databricks and Azure.
Location: Hybrid (Mississauga, Ontario). Must reside within commutable distance to the office.
Salary: $139,000–$155,000 CAD + bonus + benefits
Company
is a SaaS company providing AI-driven platforms to improve patient care and clinical outcomes.
What you will do
- Own service level objectives (SLOs), error budgets, and reliability targets for cloud-based AI infrastructure.
- Design and maintain Infrastructure as Code (IaC), operational automation, and change control workflows using Terraform.
- Implement platform security controls, including network segmentation, secrets management, and data protection.
- Lead incident response and blameless postmortems, and conduct resiliency testing through game days.
- Mentor engineers and collaborate across teams to improve platform resiliency and cost efficiency.
Requirements
- 5+ years of experience in SRE or platform engineering supporting production cloud environments.
- Strong proficiency with observability frameworks (SLI/SLO), metrics, logging, and distributed tracing.
- Expertise in Terraform, GitOps practices, and CI/CD for infrastructure changes.
- Experience with cloud platform administration (Azure ML, Databricks, or Kubernetes).
- Working knowledge of platform security, encryption, and key management.
- Must be based within commutable distance to the Mississauga office.
Nice to have
- Experience with FinOps, multi-region patterns, or disaster recovery planning.
- Knowledge of container orchestration (Kubernetes) and progressive delivery (blue/green, canary).
- Experience working in highly regulated industries such as healthcare or life sciences.
Culture & Benefits
- Competitive base salary with additional bonus and benefits package.
- Hybrid workplace promoting a balance between remote work and in-office collaboration.
- Culture focused on a security-first mindset and reliability as a product.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →