Staff Software Engineer - Grafana Databases, Managed Services
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Software Engineer (Grafana Databases, Managed Services): Operating and evolving shared production-critical infrastructure powering Grafana Cloud’s database products (Mimir, Loki, Tempo) with an accent on high-throughput multi-cloud streaming clusters (WarpStream) and analytical/storage systems. Focus on diagnosing cross-layer failure modes, designing safe upgrade strategies, improving observability and automation, and leading reliability initiatives at massive scale.
Location: Remote (UK time zones only)
Salary: GBP 103,958 - 124,750
Company
Remote-first open-source company with 20M+ users of Grafana visualization tool and 3,000+ enterprise customers using Grafana Cloud and Enterprise stacks for observability.
What you will do
- Operate and evolve 100+ multi-cloud WarpStream clusters and related database infrastructure for metrics, logs, and traces ingestion.
- Diagnose and eliminate cross-layer failure modes like object storage latency, noisy neighbors, and query regressions.
- Design safe upgrade, rollout, and migration strategies at scale.
- Improve observability, automation, operational ergonomics, and SLOs for shared systems.
- Partner with database and platform teams on scaling, partitioning, fan-out, and performance.
- Serve as escalation point, handle on-call, and own vendor relationships.
- Lead technical direction, mentor engineers, and drive multi-layer incident resolution.
Requirements
- Located in UK time zones
- 8+ years engineering experience in SRE, platform, production, infrastructure, or distributed systems.
- Experience with high-throughput streaming (Kafka, WarpStream), analytical/storage backends (Postgres, ClickHouse), or large-scale databases.
- Strong Kubernetes in AWS/GCP/Azure; IaC (Helm, Terraform, Jsonnet).
- Proficiency in systems language (Go preferred); Linux internals, networking, cloud storage, performance.
- Experience leading complex efforts, incident response, post-mortems; clear communication.
Culture & Benefits
- 100% remote global culture with high trust, autonomy, transparency, and collaboration via video.
- RSUs for all, equity ownership, bonus; 30 days annual leave + shutdown days.
- AI coding assistants with budget (GPT, Claude, Gemini); developer productivity focus.
- Balanced on-call (12 daylight hours/day, global coverage).
- In-person onboarding; career growth pathways, approachable leadership.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →