Назад
Company hidden
3 дня назад

Senior Site Reliability Engineer (SRE)

Формат работы
remote
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Argentina/Mexico/Colombia +1 еще
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (SRE): Owning the reliability, availability, and operational excellence of business-critical production systems with an accent on SLO management, incident response, and observability strategy. Focus on building a strong reliability culture through automation, blameless postmortems, and continuous improvement of high-availability environments.

Location: Must be based in Latin America (Brazil, Mexico, Argentina, or Colombia)

Company

hirify.global is an IT services and operations company focused on delivering high-quality engineering solutions.

What you will do

  • Define and improve SLIs, SLOs, and Error Budgets to measure system reliability.
  • Develop and maintain comprehensive observability strategies including monitoring, logging, and tracing.
  • Lead Incident Command during production outages and conduct blameless postmortems.
  • Own and optimize on-call rotations, escalation policies, and alert tuning to reduce noise.
  • Establish production readiness standards and partner with engineering teams on scalability and disaster recovery.
  • Automate operational processes using Python, Go, or TypeScript.

Requirements

  • 5+ years of experience in SRE or Production Engineering roles.
  • Must be based in Latin America.
  • Proven experience managing SLOs, SLIs, and Error Budgets in high-availability environments.
  • Strong experience leading incident response and blameless postmortems.
  • Proficiency in software engineering with Python, Go, or TypeScript.
  • Strong written and verbal English communication skills.

Nice to have

  • Experience with Datadog, AWS, and Kubernetes.
  • Background in regulated industries like Healthcare or Financial Services.
  • Experience with capacity planning and disaster recovery.
  • Familiarity with PostgreSQL or SQL Server.

Culture & Benefits

  • Opportunity to own reliability strategy in a dedicated SRE role.
  • Focus on blameless culture and operational excellence.
  • Collaborative environment working with modern cloud and observability stacks.
  • Remote-first work arrangement.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →