Company hidden

обновлено 6 дней назад

Staff Site Reliability Engineer (Incident Management)

133 700 - 248 300CAD

Формат работы

remote (только Canada)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Canada

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Site Reliability Engineer (SRE/Incident Management): Driving proactive reliability improvements and incident response strategies for a multi-cloud streaming platform with an accent on systemic failure analysis and automation. Focus on building reliability tooling, defining SLO/SLA frameworks, and coaching teams through post-mortems to reduce incident recurrence.

Location: Remote (Canada). Must have the ability to work in Canada without sponsorship

Salary: $133,700 – $248,300 per year

Company

hirify.global Software builds AI-powered, cloud-native products that drive digital transformation for global businesses.

What you will do

Analyze systemic failure patterns and design reliability improvements to prevent incident recurrence.
Own and optimize incident management tooling, including Rootly, PagerDuty, Jira, and Slack integrations.
Define and maintain SLO/SLA frameworks, utilizing error budgets to prioritize reliability investments.
Lead the evolution of incident response standards and practices across the engineering organization.
Review and edit customer-facing incident documents (CRCAs) to ensure clarity and quality.
Develop training programs and coach engineering teams through the post-mortem process.

Requirements

10+ years of relevant experience in SRE, incident management, or reliability engineering.
Professional experience with at least one major cloud provider: AWS, GCP, or Azure.
Experience managing reliability programs within organizations of 500+ engineers.
Deep expertise with incident management tools such as Rootly or PagerDuty.
Strong understanding of distributed systems and failure modes at scale.
Must have the ability to work in Canada without sponsorship.

Nice to have

Expertise in Kafka or event streaming technologies.
Advanced knowledge of cloud-based infrastructure and resiliency engineering.
Proficiency in scripting languages and automation tools to optimize system performance.

Culture & Benefits

Global team structure with follow-the-sun coverage to ensure sustainable working hours.
Culture of curiosity, collaboration, and continuous learning.
Environment that encourages experimentation and professional growth.
Commitment to equal opportunity and inclusivity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Похожие вакансии

Staff Site Reliability Engineer (Incident Management)

IBM Watson

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Incident Manager (AI)

Senior Site Reliability Engineer (AWS)

Site Reliability Engineer

Senior Site Reliability Engineer (Golang)

Principal Site Reliability Engineer (SaaS)

Site Reliability Engineer

Разработка

Game Dev

Design и Creative

Аналитика

Менеджмент

People & Business

Staff Site Reliability Engineer (Incident Management)

IBM Watson

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Categories

Похожие вакансии

Incident Manager (AI)

Senior Site Reliability Engineer (AWS)

Site Reliability Engineer

Senior Site Reliability Engineer (Golang)

Principal Site Reliability Engineer (SaaS)

Site Reliability Engineer