Senior Site Reliability Engineer (SRE)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (SRE): Owning the reliability, availability, and operational excellence of business-critical production systems with an accent on SLO management, incident response, and observability strategy. Focus on building a strong reliability culture through automation, blameless postmortems, and continuous improvement of high-availability environments.
Location: Must be based in Latin America (Brazil, Mexico, Argentina, or Colombia)
Company
is an IT services and operations company focused on delivering high-quality engineering solutions.
What you will do
- Define and improve SLIs, SLOs, and Error Budgets to measure system reliability.
- Develop and maintain comprehensive observability strategies including monitoring, logging, and tracing.
- Lead Incident Command during production outages and conduct blameless postmortems.
- Own and optimize on-call rotations, escalation policies, and alert tuning to reduce noise.
- Establish production readiness standards and partner with engineering teams on scalability and disaster recovery.
- Automate operational processes using Python, Go, or TypeScript.
Requirements
- 5+ years of experience in SRE or Production Engineering roles.
- Must be based in Latin America.
- Proven experience managing SLOs, SLIs, and Error Budgets in high-availability environments.
- Strong experience leading incident response and blameless postmortems.
- Proficiency in software engineering with Python, Go, or TypeScript.
- Strong written and verbal English communication skills.
Nice to have
- Experience with Datadog, AWS, and Kubernetes.
- Background in regulated industries like Healthcare or Financial Services.
- Experience with capacity planning and disaster recovery.
- Familiarity with PostgreSQL or SQL Server.
Culture & Benefits
- Opportunity to own reliability strategy in a dedicated SRE role.
- Focus on blameless culture and operational excellence.
- Collaborative environment working with modern cloud and observability stacks.
- Remote-first work arrangement.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →