Site Reliability Engineer (Postgres)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Site Reliability Engineer (Postgres/AWS): Establishing reliability practices and frameworks to enable engineering teams to own their own reliability with an accent on SLIs, SLOs, and error budget policies. Focus on designing sustainable on-call practices, automating operational toil, and driving systemic fixes from incident postmortems.
Location: Fully Remote (Global)
Company
Supabase is the Postgres development platform providing a complete backend solution including Database, Auth, Storage, Edge Functions, Realtime, and Vector Search.
What you will do
- Partner with service teams to define meaningful SLIs and SLOs and build error budget policies to guide engineering decisions.
- Own and evolve the Operational Readiness Review (ORR) process for new services and major changes.
- Strengthen the incident-to-improvement pipeline by connecting postmortem findings to systemic fixes.
- Act as the reliability expert for architecture reviews, failure mode analysis, and resilience design.
- Identify and quantify operational toil across the organization and advocate for automation to eliminate it.
- Help teams design sustainable on-call practices, improving alert quality and reducing noise.
Requirements
- 7+ years of experience in SRE, production engineering, or reliability-focused roles.
- Proven experience shaping SRE practices and driving adoption across engineering teams.
- Software engineering mindset with the ability to write code and build tools.
- Hands-on experience operationalizing SLOs/SLIs at scale and implementing error budget policies.
- Deep expertise in incident response, postmortem facilitation, and systemic improvement.
- Proficiency with AWS and infrastructure-as-code (Pulumi preferred, Terraform/CDK acceptable).
Nice to have
- Experience with Kubernetes-based platform operations.
- Familiarity with OpenTelemetry, VictoriaMetrics, Grafana, or similar observability tooling.
- Experience building developer-facing reliability tooling such as SLO dashboards or DORA metrics tracking.
Culture & Benefits
- Fully remote work with a WeWork membership or co-working allowance provided.
- Equity ownership (ESOP) for every team member.
- 100% health insurance coverage for employees and 80% for dependents.
- Annual company-wide off-sites in different cities.
- Async-first work environment with a professional development allowance for learning.
Hiring process
- Application review followed by a short intro video call.
- Up to four interviews with team leads, peers, cross-functional partners, and leadership.
- Final decision via follow-up questions or a direct offer.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β