Назад
Company hidden
3 дня назад

Staff Site Reliability Engineer - Site Experience

Формат работы
onsite
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Ireland
Вакансия из списка Hirify.GlobalВакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Site Reliability Engineer (Site Experience): Lead reliability engineering initiatives for critical user-facing systems at internet scale with an accent on APIs, content delivery, feed generation, search, messaging, and real-time experiences. Focus on designing highly available architectures, reducing operational risks, driving automation, and leading incident response.

Location: Dublin, Ireland

Company

hirify.global is a community of communities built on shared interests, home to 100,000+ active communities and 126 million daily active unique visitors, one of the internet’s largest sources of information.

What you will do

  • Drive reliability, scalability, and operational excellence for critical user-facing systems including APIs, content delivery, feeds, search, messaging, and real-time experiences.
  • Partner with product and infrastructure teams to architect systems for massive global load, guiding decisions on failover, redundancy, degradation, traffic management, and capacity planning.
  • Identify risks and bottlenecks, build mitigation strategies, and drive improvements to reduce incidents and enhance service health.
  • Build automation and tooling to eliminate repetitive work, improve deployment safety, incident response, and reliability guardrails.
  • Lead incident response, blameless postmortems, root cause analysis, and long-term fixes.
  • Champion best practices for SLIs/SLOs, capacity management, release engineering, and operational maturity; mentor engineers to raise reliability culture.

Requirements

  • 8+ years in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large-scale distributed systems.
  • Strong collaboration and communication skills to influence technical direction across teams.
  • Experience supporting high-traffic, user-facing production environments.
  • Deep understanding of distributed systems, networking, Linux systems, or cloud native architectures.
  • Strong programming skills in Go, Python, or similar.
  • Strong knowledge of observability (metrics, logging, tracing, alerting), SLOs, automation, incident management, and performance optimization.

Nice to have

  • Experience with internet-scale traffic, Kubernetes, containers, cloud infrastructure.
  • Familiarity with Prometheus, Grafana, OpenTelemetry, Envoy, Kafka, ClickHouse, Cassandra, Redis, CDN optimization, or global infrastructure.
  • Open source contributions or technical community participation.
  • Leading large-scale incident response and operational transformations.

Culture & Benefits

  • Global benefits including workspace support, professional development, caregiving, family planning, gender-affirming care, mental health & coaching.
  • Private medical, dental, vision benefits; personal retirement savings with matching; cycle to work and tax saver schemes.
  • Flexible vacation, paid volunteer time off, generous paid parental leave.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →