Senior Site Reliability Engineer (Azure Government)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Azure Government): Building and managing the SRE function for a secure, regulated SaaS platform with an accent on sovereign cloud infrastructure and government compliance standards. Focus on designing highly available Azure architectures, driving incident response, and implementing advanced observability and automation in restricted environments.
Location: Must be based in the USA (Hybrid position in San Jose, CA).
Compensation: $151,200 — $347,500 USD (based on geographic zone and experience).
Company
is a global leader in data resilience, security posture management, and AI-enabled data protection, supporting over 550,000 customers worldwide.
What you will do
- Define and implement SRE practices, including SLIs, SLOs, and error budgets for government-grade cloud environments.
- Design and maintain highly available infrastructure on Azure Government, ensuring fault tolerance and compliance.
- Lead incident response and perform blameless postmortems to drive continuous improvement.
- Build and maintain IaC, CI/CD pipelines, and configuration management in restricted environments.
- Drive observability implementation, including monitoring standards, telemetry, and alerting.
- Mentor engineering team members and advocate for SRE best practices across the organization.
Requirements
- 7+ years in Software Engineering with 3+ years in SRE or Platform Engineering.
- Experience with Government or Sovereign Cloud environments (e.g., Azure Government, AWS GovCloud).
- Knowledge of regulated compliance frameworks (FedRAMP, CMMC, HIPAA, PCI-DSS).
- Strong proficiency with IaC (Terraform) and container orchestration (Kubernetes).
- Experience with observability tools (Prometheus, Grafana, OpenTelemetry) and CI/CD/GitOps workflows.
- Must be based in the US and capable of navigating complex, high-security architecture documentation independently.
Nice to have
- Experience building an SRE function from the ground up.
- Background in chaos engineering or large-scale resilience testing.
- Familiarity with AI-first development workflows and LLM-powered infrastructure automation.
Culture & Benefits
- Unlimited paid time off plus 12 paid holidays and annual e days.
- Comprehensive medical, dental, and vision insurance effective from the first day.
- 401(k) retirement plan with company matching.
- Paid parental leave for all parents.
- Extensive professional development through LinkedIn Learning, O’Reilly, and mentorship.
- Access to mental health support, 24/7 virtual vet care, and identity protection services.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →