Senior Site Reliability Engineer (Resilience) - Platform Resilience

154 800 - 195 600$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (Resilience) - Platform Resilience: Building and maintaining highly reliable, scalable multi-cloud infrastructure powering mission-critical SaaS services with an accent on automation, observability, and incident prevention strategies. Focus on designing tools and frameworks, conducting root cause analysis, and enhancing system resilience in large-scale distributed environments.

Location: Remote in the United States

Salary: $154,800 to $195,600 USD

Company

Global Platform Engineering organization supporting large-scale distributed SaaS and platform services across multi-cloud environments.

What you will do

Design, build, and maintain reliable multi-cloud platform infrastructure for large-scale SaaS services
Lead initiatives on automation, reliability engineering, and system resilience improvements
Develop tools, software, and automation frameworks to boost infrastructure efficiency
Respond to incidents via root cause analysis, problem management, and prevention
Participate in global on-call rotation with follow-the-sun model
Collaborate with engineering teams on infrastructure challenges and observability enhancements
Drive infrastructure-as-code practices, documentation, and operational excellence

Requirements

Experience as Site Reliability Engineer, Platform Engineer, or Software Engineer in large-scale distributed systems
Strong software engineering background for designing automation and infrastructure solutions
Hands-on experience with public cloud platforms and managed Kubernetes environments
Proficiency in at least one programming language (e.g., Go, Python)
Strong knowledge of Linux systems administration, containerized environments (Docker), and cloud-native architectures
Familiarity with observability tools (Prometheus, Grafana), incident response, and reliability best practices
Strong communication skills for globally distributed teams

Nice to have

Experience with Infrastructure-as-Code tools such as Terraform or Crossplane
Experience operating or supporting SaaS platforms in production
Experience building or scaling Kubernetes across multiple cloud providers

Culture & Benefits

Competitive base salary with equity participation
Company-matched 401(k) up to 6%
Comprehensive health coverage, paid parental leave (minimum 16 weeks), and generous PTO
Remote-friendly global work environment with flexible arrangements
Focus on employee well-being, work-life balance, volunteer time off, and inclusive culture