Staff Production Operations Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Production Operations Engineer (AI): Driving 's reliability operations program and automating operational toil using AI agents with an accent on incident lifecycle management and global on-call coordination. Focus on building automation on the Agentic Data Plane platform, reducing alert noise, and systematically improving system reliability.
Location: Must be based in the United States or Canada
Salary: $211,000 – $256,000
Company
is pioneering the Agentic Data Plane (ADP), a new category in AI infrastructure that connects AI agents with enterprise data and systems.
What you will do
- Drive process improvements across the incident lifecycle, including severity models, triage enforcement, and alert noise reduction.
- Coordinate the global on-call program, managing schedules, shadow rotations, and engineer onboarding.
- Facilitate blameless post-incident reviews, document findings, and track follow-up completion.
- Build AI agents to automate operational toil, including on-call automation and incident summarization.
- Maintain and evolve runbooks, playbooks, and incident process documentation.
Requirements
- Must be based in the United States or Canada.
- 5+ years of experience in SRE, DevOps, or production operations in large-scale, highly reliable environments.
- Proficiency in Go or a comparable systems language.
- Hands-on experience with incident management tools (e.g., PagerDuty, incident.io) and observability stacks (e.g., Datadog, Grafana).
- Working knowledge of AWS, Azure, or GCP and infrastructure as code (IaC).
- Experience with AI-assisted software development workflows using tools like Claude Code.
Nice to have
- Hands-on experience building agents or automations using LLMs.
- Familiarity with , Apache Kafka, or other streaming infrastructure.
- Prior experience in a fast-growing B2B infrastructure or developer tools company.
Culture & Benefits
- People-first organization with a culture based on trust, transparency, and kindness.
- High-impact environment with a budget for the latest AI tools.
- Diverse, globally distributed team structure.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →