Senior Site Reliability Engineer (Observability)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Observability): Own and evolve 's observability stack, including OpenTelemetry and Datadog, to provide reliable, actionable metrics, traces, and logs across services with an accent on AI-powered agents and automation. Focus on driving adoption of SLOs, distributed tracing, structured logging, and improving incident response processes.
Location: Remote-first (United States; BC & ON, Canada; Argentina)
Company
Building the world’s leading AI-native Digital Experience Platform as a remote-first company.
What you will do
- Join the Observability team to ensure engineers have tools, data, and practices for application and hosting health.
- Dive into the main application in TypeScript, Node, or Go to debug and fix production issues.
- Build and maintain AI-powered agents to surface insights, reduce alert fatigue, and accelerate incident resolution.
- Guide teams on instrumentation, SLOs, tracing, and logging for confident production deployments.
- Participate in on-call, incident response, and automate workflows to reduce toil.
- Partner with engineering teams to define and improve observability practices.
Requirements
- Business-level fluency to read, write and speak in English
- BS/BA or relevant experience
- 5+ years building, maintaining, and debugging distributed systems in customer-facing environments
- Hands-on with observability tools like Datadog, Grafana, Prometheus, Elasticsearch
- Experience with OpenTelemetry or similar for metrics, traces, profiles, logs
- SLOs/SLIs at scale, AWS/GCP, container architectures (Docker, Kubernetes), IaC (Terraform, Pulumi)
- Full-stack contributions with React, Node.js, MongoDB/PostgreSQL
Nice to have
- AI agents for observability data (root cause analysis, alerting, querying)
- OpenTelemetry, Kubernetes, Pulumi specifically
- Improving on-call and incident response
Culture & Benefits
- Equity (RSUs) for permanent employees
- Comprehensive health, dental, vision coverage
- Paid parental leave (12 weeks), family planning support
- Flexible vacation, holidays, sabbatical
- Mental health resources, wellness stipends
- 401(k) match (US), retirement support globally, annual bonus
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →