Site Reliability Engineer (AWS)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Site Reliability Engineer (AWS): Designing and maintaining highly available, scalable cloud infrastructure to support core data platforms and AI workloads with an accent on Infrastructure as Code, observability, and database performance. Focus on managing incident response, optimizing high-traffic multi-tenant systems, and ensuring reliability under heavy load.
Location: Fully Remote
Compensation: β¬60,000ββ¬80,000
Company
A B-Corporation startup empowering companies to manage their climate impact through advanced technology.
What you will do
- Design, implement, and maintain secure, scalable cloud infrastructure for AI and data platforms using Terraform.
- Expand observability strategies and improve monitoring/alerting via Datadog.
- Build and optimize infrastructure to support machine learning model training and deployments.
- Participate in on-call rotations, incident response, and post-mortem reviews.
- Collaborate with engineering teams to scale high-traffic, multi-tenant systems.
- Implement security protocols and ensure compliance with SOC 2 and ISO 27001 standards.
Requirements
- 3+ years of professional DevOps or SRE experience (5+ preferred).
- Strong knowledge of AWS (ECS, Fargate), Docker, and Terraform.
- Proven experience with PostgreSQL at scale, including sharding or clustering.
- Strong background in high-traffic, multi-tenant systems.
- Fluency in English (French is a plus).
- Experience with observability tools (Datadog expertise strongly preferred).
Nice to have
- Experience with Ruby on Rails and Kubernetes/ARC.
- Background in Change Data Capture (CDC) and data pipelines.
- Familiarity with Snowflake.
Culture & Benefits
- Flexible work model promoting a balanced remote culture.
- Opportunity to work at an early-stage startup with global teams.
- Emphasis on climate change impact and sustainable B-Corp business practices.
- Collaborative environment through an internal SRE guild.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β