Lead Site Reliability Engineer (Fintech)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Lead Site Reliability Engineer (AWS/Kubernetes): Driving reliability across a financial services platform with an accent on multi-region infrastructure, SRE best practices, and operational excellence. Focus on architecting enterprise-scale AWS environments, implementing production-scale Kubernetes patterns, and leading chaos engineering initiatives.
Location: Remote (USA). Applicants must be authorized to work in the U.S. (No sponsorship provided).
Salary: $114,000 - $165,300
Company
is a financial services company focused on transforming financial lives through a flexible work environment and inclusive culture.
What you will do
- Lead cross-functional reliability initiatives and define organizational SRE best practices, tools, and methodologies.
- Architect enterprise-scale, multi-region AWS infrastructure and production-scale Kubernetes patterns.
- Establish and operate SLOs, SLIs, and error budgets to drive prioritization decisions.
- Serve as incident commander for major incidents and lead disaster recovery planning for critical infrastructure.
- Build shared Infrastructure as Code foundations using Terraform modules and patterns.
- Establish observability standards using Datadog and Splunk and drive FinOps initiatives to optimize cloud spend.
Requirements
- 7 to 10 years of Site Reliability Engineering experience with demonstrated technical leadership.
- Expert knowledge of AWS (large-scale, multi-region) and deep Kubernetes expertise.
- Mastery of Terraform and strong software engineering background in Python and/or Go.
- Extensive experience with observability platforms like Datadog and Splunk.
- Proven track record of leading major incidents and conducting effective postmortems.
- Must be authorized to work for any employer in the U.S. (visa sponsorship is not available).
Nice to have
- Financial services industry experience and understanding of SOC 2, PCI DSS, or FINRA compliance.
- Professional AWS certifications or Kubernetes certifications (CKA, CKAD, CKS).
- Experience implementing SRE at organizations with 500+ engineers.
- Background in chaos engineering, game days, and service mesh management.
Culture & Benefits
- Comprehensive medical, dental, vision, and life insurance.
- 401(k) plan with generous company matching contributions (up to 6%).
- Tuition reimbursement up to $5,250 per year.
- Generous paid time off, including 10 company holidays and 3 floating holidays.
- Paid parental leave and short/long-term disability programs.
- 16 hours of paid volunteer time per calendar year.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →