Manager, Site Reliability Engineering

Формат работы

hybrid

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Manager, Site Reliability Engineering: Building and leading a high-performing SRE team to ensure system reliability, scalability, and observability for the hirify.global Data Cloud with an accent on driving SRE principles (SLIs/SLOs, error budgets) and incident excellence. Focus on making code contributions, leading software-first reliability investments, and operationalizing reliability strategies.

Location: Office-based in Czechia, with remote work possible if located in the Czech Republic.

Company

hirify.global is the #1 global market leader in data resilience, providing data backup, recovery, portability, security, and intelligence to over 550,000 customers worldwide.

What you will do

Hire, onboard, and grow your SRE team, fostering a psychologically safe, blameless culture.
Establish and operationalize SLIs/SLOs and error budgets with service owners.
Ensure incident response readiness, lead/coordinate major incidents, and drive fast, high-quality postmortems.
Lead software-first reliability investments in observability, deployment safety, and resilience testing.
Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations.
Track and cap toil, ensuring sustainable operational coverage and monitoring on-call health.

Requirements

7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers.
Demonstrable experience leading engineering teams to predictably deliver outcomes.
Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana).
Coding background with experience improving service reliability.
Hands-on incident management and postmortem practice; excellent cross-geo communication.
Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays).

Nice to have

Demonstrated success leading SLO/error-budget adoption and reliability programs for cloud services.
Experience operating a multi-region, follow-the-sun on-call model.
Background in chaos/resilience/performance testing and release validation.
Track record building or scaling SRE teams and influencing org-wide standards.
Familiarity with compliance frameworks common to SaaS.

Culture & Benefits

25 vacation days, 4 sick days, 21 paid medical leave days, plus 4 extra global hirify.globale Days for self-care and 24 paid volunteer hours annually.
Premium private medical insurance for employees and dependents.
Daily meal vouchers for restaurants and groceries (180 CZK per working day).
Flexible cafeteria platform with thousands of lifestyle benefit options.
Multisport Card for gym and wellness, with family add-on options.
Annual public transport reimbursement up to a set limit.
Corporate mobile plan with optional family tariff.
Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops and learning events.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →