Staff Site Reliability Engineer
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Site Reliability Engineer (AWS/Kubernetes): Set the direction for Oura’s AWS infrastructure strategy and operate with ownership across multiple teams and systems, with an accent on reliability, scalability, and cost-efficient architecture. Focus on building organization-wide observability, fault-tolerant systems, and deployment/release pipelines while leading complex cross-team infrastructure initiatives and incident response.
Location: Remote (United States)
Salary: $169,150-$233,000 (US regions 1–3)
Company
Oura builds connected health products that help people understand readiness, activity, and sleep quality.
What you will do
- Set technical direction and long-term architecture for Oura’s AWS infrastructure, covering reliability, scalability, and cost efficiency.
- Lead infrastructure-as-code standards and evolve shared platform patterns; drive service migrations onto best practices.
- Architect and implement organization-wide observability (monitoring and alerting) and design fault-tolerant systems that degrade gracefully.
- Plan and execute complex multi-team infrastructure initiatives, including phased rollouts and cross-team migrations.
- Own deployment pipeline evolution and dependency management to enable fast, robust, and safe testing and releases.
- Lead operational excellence: define SLAs, manage incident response for the most complex production issues, and drive continuous improvement.
Requirements
- 8+ years of backend development and infrastructure experience, including leading complex cross-team technical initiatives.
- Experience architecting and operating data-intensive distributed systems in production at scale.
- Deep AWS experience running, monitoring, and debugging production systems; strong familiarity with EKS, RDS, S3, SQS, Kinesis, Lambda, and DynamoDB.
- Kubernetes expertise with experience running and orchestrating containers on EKS (or similar) and optimizing for reliability, security, and cost.
- Experience with serverless architectures and robust deployment pipelines (GitHub Actions is a bonus).
- Strong communication and leadership skills, including mentoring engineers and mediating technical disagreements.
Nice to have
- Experience in healthcare, wearable technology, or supporting large enterprise customers.
- Database management and data pipeline optimization experience.
- Programming skills in Python, Go, or JavaScript/TypeScript.
- Experience defining and driving SLO/SLI frameworks across an organization.
- Track record contributing to or leading open-source infrastructure projects.
Culture & Benefits
- Health, dental, vision insurance, and mental health resources.
- 20 days paid time off plus 13 paid holidays and 8 days flexible wellness time off; paid sick leave and parental leave.
- Oura Ring provided for personal use and employee discounts for friends & family.
- Market-based pay approach with US location tiers; recruiter determines your tier based on US location.
- Remote role with offices in San Francisco and San Diego for hybrid/office preferences.
Hiring process
- Interviews to assess technical leadership, architecture, reliability practices, and cross-team collaboration.
- Final offer process includes sending official offers through Docusign after a verbal offer.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →