Principal Site Reliability Engineer (Fintech)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Principal Site Reliability Engineer (Fintech): Design, build, and maintain secure, highly available infrastructure for a transaction-heavy fintech platform with an accent on AWS cloud, Kubernetes orchestration, and observability tooling. Focus on leading incident response, defining SLOs, and driving reliability improvements through automation and root cause analysis.
Hybrid: Onsite 4x per month (Makati City, Manila). Shift Options: 5:00 AM - 2:00 PM or 6:00 AM - 3:00 PM (Manila). Workdays: MonβFri, TueβSat, or WedβSun.
Company
Global fintech leader providing technology solutions for financial services, focusing on high-availability platforms.
What you will do
- Design, build, and maintain secure, scalable infrastructure using IaC, automation, and CI/CD pipelines.
- Troubleshoot production issues, perform performance tuning, and improve monitoring/observability.
- Lead high-severity incident response, RCA, and implementation of reliability improvements.
- Act as technical authority for SRE/DevOps practices, guiding architectural decisions on AWS, CI/CD, and tooling.
- Mentor SRE team, delegate tasks, and promote ownership and continuous improvement.
Requirements
- 10+ years in SRE, DevOps, or Infrastructure Engineering as senior IC/SME.
- Deep hands-on with AWS, Terraform/Chef, CI/CD (Jenkins/GitHub Actions/GitLab CI), Docker/Kubernetes.
- Observability (Datadog/ELK/Splunk), Python/Bash scripting, strong SQL (Aurora PostgreSQL preferred).
- Kafka (AWS MSK/Confluent), incident response, on-call operations, distributed systems, networking, security.
Nice to have
- AWS, Kubernetes, or Kafka certifications.
- Hybrid/multi-cloud experience.
- Background in high-transaction/customer-facing systems.
Culture & Benefits
- Collaborative, inclusive environment empowering authentic contributions.
- Hands-on leadership in small SRE team (3-6 engineers).
- Focus on continuous improvement, ownership, and high-impact technical decisions.
- Flexible shift and workday options.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β