Назад
Company hidden
23 часа назад

Manager, Site Reliability Engineering

Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
CR
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Manager, Site Reliability Engineering: Building and leading a high-performing SRE team to ensure system reliability, scalability, and observability for the hirify.global Data Cloud with an accent on driving SRE principles (SLIs/SLOs, error budgets) and incident excellence. Focus on making code contributions, leading software-first reliability investments, and operationalizing reliability strategies.

Location: Office-based in Czechia, with remote work possible if located in the Czech Republic.

Company

hirify.global is the #1 global market leader in data resilience, providing data backup, recovery, portability, security, and intelligence to over 550,000 customers worldwide.

What you will do

  • Hire, onboard, and grow your SRE team, fostering a psychologically safe, blameless culture.
  • Establish and operationalize SLIs/SLOs and error budgets with service owners.
  • Ensure incident response readiness, lead/coordinate major incidents, and drive fast, high-quality postmortems.
  • Lead software-first reliability investments in observability, deployment safety, and resilience testing.
  • Drive platform improvements (IaC, CI/CD, Kubernetes) and internal tools that scale operations.
  • Track and cap toil, ensuring sustainable operational coverage and monitoring on-call health.

Requirements

  • 7+ years in Software, Platform, and/or Reliability Engineering with 2+ years managing engineers.
  • Demonstrable experience leading engineering teams to predictably deliver outcomes.
  • Experience with public cloud (Azure preferred), Kubernetes, IaC (Terraform, Pulumi), CI/CD (Github Actions, ArgoCD, Azure DevOps), and observability (OpenTelemetry, Elastic, Datadog, Prometheus, Grafana).
  • Coding background with experience improving service reliability.
  • Hands-on incident management and postmortem practice; excellent cross-geo communication.
  • Willingness to participate in an on-call rotation (typically during daytime hours, including weekends/holidays).

Nice to have

  • Demonstrated success leading SLO/error-budget adoption and reliability programs for cloud services.
  • Experience operating a multi-region, follow-the-sun on-call model.
  • Background in chaos/resilience/performance testing and release validation.
  • Track record building or scaling SRE teams and influencing org-wide standards.
  • Familiarity with compliance frameworks common to SaaS.

Culture & Benefits

  • 25 vacation days, 4 sick days, 21 paid medical leave days, plus 4 extra global hirify.globale Days for self-care and 24 paid volunteer hours annually.
  • Premium private medical insurance for employees and dependents.
  • Daily meal vouchers for restaurants and groceries (180 CZK per working day).
  • Flexible cafeteria platform with thousands of lifestyle benefit options.
  • Multisport Card for gym and wellness, with family add-on options.
  • Annual public transport reimbursement up to a set limit.
  • Corporate mobile plan with optional family tariff.
  • Opportunities to learn and grow through on-demand libraries (LinkedIn Learning, O’Reilly), mentoring, workshops and learning events.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →