Staff Site Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Site Reliability Engineer (AI): Building standards and frameworks for feature teams to operate their services autonomously with an accent on enterprise-grade reliability and security. Focus on designing incident response frameworks, creating documentation, and guiding teams through operational maturity.
Location: USA - CA - Palo Alto, USA - NY - New York
Hiring Range: $169,800 - $233,500
Company
is one of the largest B2B AI-native companies, driving business outcomes across multiple industry verticals.
What you will do
- Review RFCs and PRDs to prevent downstream issues and provide architectural guidance.
- Create documentation and tooling that eliminate support dependencies.
- Design incident response frameworks and comprehensive playbooks.
- Define technical standards and operational frameworks.
- Guide teams through ownership maturity and operational best practices.
- Maintain multi-tenant, multi-cloud infrastructure.
Requirements
- 10+ years in DevOps/SRE roles with proven experience.
- Deep incident management and on-call system design experience.
- Expert-level AWS, Kubernetes, and Infrastructure as Code (Terraform).
- Strong technical writing skills.
- Experience with API design patterns and service architecture.
- Proficiency in scripting (Go, Python, Bash, etc.) and CI/CD systems.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →