TL;DR
Staff Backend Engineer (Grafana, Distributed Systems): Operating and evolving critical shared database infrastructure for Grafana Cloud's next-generation products (Mimir, Loki, Tempo) with an accent on diagnosing cross-layer failure modes, designing safe upgrade strategies, and improving observability and automation. Focus on leading complex initiatives across high-throughput, multi-cloud infrastructure, establishing best practices for reliability, and driving architectural improvements.
Location: Remote, restricted to Canadian time zones
Salary: CAD 186,368 – CAD 223,642
Company
hirify.global is a remote-first, open-source company providing observability solutions with over 20M users and helping more than 3,000 companies manage their observability strategies.
What you will do
- Operate and evolve 100+ multi-cloud streaming clusters and related database infrastructure powering Grafana Cloud.
- Diagnose and eliminate cross-layer failure modes, improving observability and operational ergonomics.
- Design and implement safe upgrade and rollout strategies at scale for distributed systems.
- Partner with database and platform teams to ensure safe scaling, partitioning, and query performance.
- Lead complex initiatives, define technical direction, and establish best practices for SLOs, scaling limits, and reliability.
- Serve as a primary escalation point, participate in on-call rotations, and own vendor relationships for critical systems.
Requirements
- 8+ years of engineering experience, including meaningful time in SRE, platform, production, infrastructure engineering, or distributed systems roles.
- Experience with high-throughput streaming systems, analytical/storage backends, or large-scale database infrastructure (e.g., Kafka, Postgres, Cassandra, Snowflake).
- Strong Kubernetes experience in AWS, GCP, or Azure, and familiarity with infrastructure-as-code tooling (Helm, Terraform).
- Proficiency in at least one systems-oriented language (Go preferred).
- Ability to lead complex technical efforts, influence technical direction, and align teams around reliability improvements.
- Clear communication skills to collaborate across teams and work autonomously.
Culture & Benefits
- 100% Remote, global culture with talent from around the world.
- Work in a high-growth, innovation-driven environment with transparent communication.
- Open source roots and empowered teams with a high trust, low ego culture.
- Defined career growth pathways and approachable leadership.
- Global annual leave policy of 30 days, including 3 Grafana Shutdown Days.
- Access to modern AI coding assistants and frontier models with a company-funded usage budget.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →