Senior Site Reliability Engineer (Observability)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Observability): Responsible for the reliability, scalability, and evolution of on-premises observability platforms including ELK Stack and Grafana with an accent on operations ownership and platform engineering. Focus on managing cluster health, capacity planning, automation tooling, and enforcing observability best practices across engineering teams.
Location: Hybrid in Austin or Charlotte (US)
Company
Asset management firm with comprehensive benefits, educational initiatives, and a focus on equal opportunity.
What you will do
- Serve as primary escalation point for production support on ELK Stack, Grafana, and New Relic.
- Own platform health, capacity planning, performance tuning, and SLO monitoring for observability infrastructure.
- Support engineering teams with onboarding, instrumentation, dashboards, and alerts.
- Design and build automation tooling to reduce toil and improve platform experience.
- Lead modernization initiatives like ingestion pipelines, scaling, and standardizing dashboards/alerts.
- Develop infrastructure-as-code with Terraform, Helm, Ansible and contribute to platform roadmap.
Requirements
- Bachelor’s degree in technical field or equivalent experience
- 5+ years in SRE, DevOps, or platform engineering
- Deep hands-on with ELK Stack: Elasticsearch operations, Logstash pipelines, Kibana, index lifecycle.
- Strong Grafana experience: data sources, dashboards, alerting.
- Observability principles, on-premises infrastructure operations, capacity planning.
- Python proficiency for automation, Linux systems, configuration management (Ansible, etc.).
- Incident resolution and clear communication under pressure; bias toward automation.
Nice to have
- Prometheus experience.
- New Relic administration or APM.
- Log shipping agents (Beats, Fluentd, Fluent Bit).
- Distributed tracing (OpenTelemetry).
- Cloud observability and hybrid strategies.
- Governing observability standards in large organizations.
Culture & Benefits
- Hybrid work model balancing operations and engineering projects.
- On-call rotations with runbooks and escalation procedures.
- Comprehensive benefits, educational initiatives, and career support programs.
- Focus on excellence, automation, and relentless platform improvement.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →