5 дней назад
Senior Site Reliability Engineer (Observability)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
Senior Site Reliability Engineer (Observability): Ownership of observability platforms for reliability, scalability, and continued evolution, with an accent on ELK (Elasticsearch, Logstash, Kibana), Grafana, and incident-driven operations. Focus on maintaining SLOs, reducing toil through automation, and modernizing platform components using infrastructure-as-code.
Location: Austin
Company
builds and operates platforms that support engineering visibility and reliability.
What you will do
- Act as the primary escalation point for production support across the ELK Stack, Grafana, and New Relic.
- Own platform health, capacity planning, and performance tuning for on-premises observability infrastructure (Elasticsearch cluster management, index lifecycle, retention).
- Monitor and maintain SLOs, and support engineering onboarding with instrumentation, dashboards, and alert definitions.
- Manage patching, upgrades, and configuration management across the observability stack, including collaboration with security on hardening and vulnerability management.
- Contribute to platform engineering by designing and building automation/tooling to reduce toil and improve platform experience.
- Develop and maintain infrastructure-as-code (Terraform, Helm, Ansible, etc.) and help standardize logging/metrics/alerting practices at scale.
Requirements
- 5+ years of experience in SRE, DevOps, or platform engineering roles.
- Deep hands-on experience with the ELK Stack, including Elasticsearch cluster operations, Logstash pipeline development, Kibana, and index lifecycle management.
- Strong experience with Grafana, including data source integrations, dashboard design, and alerting.
- Solid understanding of observability principles and experience operating on-premises infrastructure (capacity planning and operational tradeoffs vs managed cloud).
- Proficiency in Python for automation and tooling, plus familiarity with shell scripting.
- Strong Linux systems knowledge and comfort with configuration management tools (e.g., Ansible, Chef, Puppet).
Culture & Benefits
- Hybrid role with roughly half the time on steady-state operations and platform support, and half on engineering projects.
- Benefits and educational initiatives, plus special celebrations of company history, culture, and growth.
- Equal opportunity employer.
Hiring process
- Interviews focused on SRE/observability ownership, incident resolution, and platform modernization/automation experience.
- Discussion of how experience maps to ELK/Grafana operations and infrastructure-as-code practices.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Похожие вакансии
5 дней назад
Senior Site Reliability Engineer
129 098 - 189 343$
5 дней назад
Site Reliability Engineer
81 000 - 142 000$
5 дней назад
Senior Site Reliability Engineer (SRE)
99 090 - 123 860$
5 дней назад
Sr. Site Reliability Engineer I
89 000 - 178 000$
5 дней назад
Senior Staff Software Systems Engineer (Observability)
102 000 - 149 600$
3 дня назад