Senior Site Reliability Engineer

151 000 - 190 000CAD

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

US/Canada

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer: Improving and protecting the reliability, performance, and operability of an AWS-based production system with an accent on SRE practices, observability, and infrastructure as code. Focus on designing reliable services, contributing to automation, and participating in incident response for a fast-moving software platform.

Location: Hybrid in Toronto (Tuesday-Thursday in-office, Monday/Friday WFH). Also open to remote applicants from most US states (excluding Alabama, Alaska, Connecticut, Hawaii, Kentucky, Mississippi, Nebraska, New Mexico, North Dakota, Rhode Island, South Dakota, West Virginia, and Wyoming).

Salary: $151,000–$190,000 CAD base salary range + annual bonus

Company

hirify.global is transforming the commercial contracting industry with its software platform and recently achieved unicorn status after a $127M Series C funding round.

What you will do

Drive and refine modern SRE practices, including SLIs/SLOs and error budgets.
Design and maintain end-to-end observability (metrics, logs, traces, dashboards, and alerts).
Partner with product and engineering teams to design reliable services, reviewing architectures and rollout strategies.
Evolve and operate AWS infrastructure using Infrastructure as Code (Terraform).
Contribute code to services, tooling, and automation.
Participate in incident response for infrastructure-related production issues.

Requirements

5+ years of professional experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or production-focused Software Engineering.
Proven experience leading multi-sprint, multi-engineer reliability or infrastructure projects.
Thorough understanding and hands-on experience with modern SRE practices like SLIs/SLOs, error budgets, and toil reduction.
Software engineering experience, comfortable working in Python or Node.js/TypeScript.
Strong interest and experience using LLMs and AI-assisted tooling in your workflow.
Strong observability skills, including designing metrics, logging, and tracing for multi-service systems (Datadog, Prometheus, Grafana experience preferred).
Experience with AWS in production, Terraform, and container/orchestration platforms (Docker with ECS, EKS, or Kubernetes).
Ability and willingness to participate in a production on-call rotation.
Ability to work a hybrid schedule in Toronto (Tuesday-Thursday in-office) or remotely from eligible US states.

Nice to have

Incident management experience, including participating in or coordinating incident response and working within an incident management tool.

Culture & Benefits

Generous equity grant and a comprehensive benefits package.
Flexible PTO and hybrid work schedules with a work-from-home stipend.
Company events like BBQs and team-building activities, both in-person and virtual.
Fast-paced, collaborative, and dynamic work environment.
Opportunities for growth and career advancement.
Chance to work with cutting-edge technology and innovative solutions.