Senior Site Reliability Engineer (Government & Sovereign Cloud)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Government & Sovereign Cloud): Supporting the Data Cloud, a new SaaS platform, with an accent on the Government and Sovereign Cloud environment. Focus on defining reliability engineering practices, mapping systems, writing runbooks, and setting baselines.
Location: Office Based in San Jose, CA
Salary: $109,800 — $252,500 USD depending on US Geographic Zone
Company
is the Data and AI Trust Company, specializing in helping organizations ensure their data and AI are fully understood, secured, and resilient to enable the acceleration of safe AI at scale.
What you will do
- Get up to speed on the full platform, including all VDC workloads, dependencies, and risk areas, through code, docs, and conversations.
- Design infrastructure for high availability and fault tolerance on Azure (including Azure Government).
- Run incident response and blameless postmortems, turning incidents into improvements.
- Close observability gaps by defining instrumentation requirements and driving implementation.
- Build and maintain testing, canary deployment, and release validation pipelines.
- Work across product, platform, security, legal, compliance, and operations teams.
Requirements
- 7+ years in Software Engineering, with 3+ years in SRE, Platform Engineering, or similar.
- Experience with Government or Sovereign Cloud (e.g., Azure Government, AWS GovCloud).
- Experience in regulated compliance environments (government, financial, or healthcare).
- Strong experience building and running production services on cloud infrastructure (Azure preferred, including Azure Government).
- Able to learn large, complex platforms quickly with limited guidance.
- Programming skills in one or more of: TypeScript/JS, Go, Java, C#, or similar.
Nice to have
- Experience on B2B SaaS platforms in regulated or government markets.
- Background in chaos engineering, resilience testing, or performance/load testing.
- Have built an SRE or reliability function from scratch before.
- Familiar with AI-first development workflows.
Culture & Benefits
- Unlimited paid time off, 12 paid holidays, plus 4 extra global e Days for self-care and 24 paid volunteer hours annually through Cares.
- Paid parental leave: 8 weeks for all parents, 16 weeks for birthing parents.
- Medical, dental, and vision coverage starting on your first day.
- 401(k) retirement plan with company matching contributions.
- Opportunities to learn and grow through on-demand libraries, mentoring, workshops, and learning events.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →