Назад
Company hidden
5 дней назад

Senior Site Reliability Engineer (Observability)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (Observability): Ownership of observability platforms for reliability, scalability, and continued evolution, with an accent on ELK (Elasticsearch, Logstash, Kibana), Grafana, and incident-driven operations. Focus on maintaining SLOs, reducing toil through automation, and modernizing platform components using infrastructure-as-code.

Location: Austin

Company

hirify.global builds and operates platforms that support engineering visibility and reliability.

What you will do

  • Act as the primary escalation point for production support across the ELK Stack, Grafana, and New Relic.
  • Own platform health, capacity planning, and performance tuning for on-premises observability infrastructure (Elasticsearch cluster management, index lifecycle, retention).
  • Monitor and maintain SLOs, and support engineering onboarding with instrumentation, dashboards, and alert definitions.
  • Manage patching, upgrades, and configuration management across the observability stack, including collaboration with security on hardening and vulnerability management.
  • Contribute to platform engineering by designing and building automation/tooling to reduce toil and improve platform experience.
  • Develop and maintain infrastructure-as-code (Terraform, Helm, Ansible, etc.) and help standardize logging/metrics/alerting practices at scale.

Requirements

  • 5+ years of experience in SRE, DevOps, or platform engineering roles.
  • Deep hands-on experience with the ELK Stack, including Elasticsearch cluster operations, Logstash pipeline development, Kibana, and index lifecycle management.
  • Strong experience with Grafana, including data source integrations, dashboard design, and alerting.
  • Solid understanding of observability principles and experience operating on-premises infrastructure (capacity planning and operational tradeoffs vs managed cloud).
  • Proficiency in Python for automation and tooling, plus familiarity with shell scripting.
  • Strong Linux systems knowledge and comfort with configuration management tools (e.g., Ansible, Chef, Puppet).

Culture & Benefits

  • Hybrid role with roughly half the time on steady-state operations and platform support, and half on engineering projects.
  • Benefits and educational initiatives, plus special celebrations of company history, culture, and growth.
  • Equal opportunity employer.

Hiring process

  • Interviews focused on SRE/observability ownership, incident resolution, and platform modernization/automation experience.
  • Discussion of how experience maps to ELK/Grafana operations and infrastructure-as-code practices.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →