Staff Software Engineer (Platform SysEng)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Software Engineer (Platform SysEng): Build and operate a highly available, low-latency observability platform infrastructure, with an accent on distributed systems maturity, scalability, and reliability. Focus on system design tradeoffs, cloud-native platform operations (Kubernetes/IaC), and improving performance and efficiency end-to-end as the platform scales.
Location: Remote in the United States (USA time zones; EST + CST highly preferred)
Salary: USD 174,986 - USD 209,983 base
Company
builds Grafana Cloud, a fully managed observability platform based on open source and open standards.
What you will do
- Own the maturity and scalability of the Platform SysEng squad, improving platform performance, reliability, and efficiency.
- Deliver and operate distributed systems that process and store metrics, logs, and traces at very high throughput.
- Improve platform scalability by reducing new region build timelines to meet customer demands.
- Define and drive reliability work end-to-end, including SLOs/SLIs, capacity planning, and performance tuning.
- Work across squads and teams to align priorities and ship changes from design docs through integration testing to production.
- Participate in on-call rotations to ensure production service health.
Requirements
- Location/Time zone: must be located in the United States; USA time zones required (EST + CST highly preferred)
- Proven delivery of large distributed systems, including shipping and operating complex systems across multiple teams with technical leadership impact.
- Demonstrable system design experience with deep understanding of latency, consistency, availability, scaling, and cost tradeoffs.
- Hands-on cloud and platform experience with cloud-native architectures (microservices, containers/Kubernetes, IaC) and operational practices.
- Reliability and performance ownership: comfortable defining SLOs/SLIs, capacity planning, tuning performance, and driving reliability work end-to-end.
- Strong coding and design skills; experience with Go is used, but Go or comparable languages (Python/C/C++/Rust or similar) translate well.
Nice to have
- Experience with open source or community-based projects.
- Familiarity with Kubernetes scheduling and Karpenter.
- Terraform and/or Crossplane experience.
- Experience with Tanka and/or Jsonnet.
Culture & Benefits
- 100% remote company with a global culture; consensus-based collaboration and strong communication.
- On-call rotations as part of operating production services.
- Developer productivity support, including AI coding assistants with a company-funded usage budget and strong code review/quality standards.
- RSUs for shared outcomes and ownership; base compensation plus equity/bonus (if applicable).
- Global annual leave policy of 30 days, with 3 days reserved for Grafana Shutdown Days.
- In-person onboarding to help new hires ramp up.
Hiring process
- Interview process includes assessment of level, experience, and skillset; compensation is discussed at the beginning for non-listed locations.
- Recruitment may use AI tools to match CV information, with manual review by the recruitment team.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →