Senior Reliability Engineer (AWS/Kubernetes)

Формат работы

remote (только Mexico)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Mexico

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Reliability Engineer (AWS/Kubernetes): Operating, observing, and improving reliability of distributed systems with an accent on observability, operational maturity, and automated responses to system behavior. Focus on designing observability strategies, defining SLIs/SLOs, enhancing autoscaling/self-healing mechanisms, and performing root cause analysis for production incidents.

Location: Mexico City, 100% remote

Company

Leading nearshore staff augmentation provider headquartered in New York with 600+ tech professionals based in Latin America partnering with U.S. companies on digital transformation projects.

What you will do

Design and improve observability strategies including metrics, logs, traces, alerts, and dashboards across services.
Analyze system behavior in production to identify failure modes, bottlenecks, and risks.
Maintain AWS CDK/CDK8s constructs focused on observability, autoscaling, and safeguards.
Operate core platform components like VPC, EKS, RDS, OpenSearch, MSK, and Kubernetes addons.
Define SLIs, SLOs, alerting strategies, and automated responses including self-healing and runbooks.
Collaborate on incident investigations, root cause analysis, CI/CD for IaC, and apply SRE principles.

Requirements

5+ years in SRE, Platform Engineering, or Infrastructure roles with hands-on production systems support
Strong observability experience: metrics, logs, traces, dashboards, alerts for complex systems
Hands-on with AWS (VPC, IAM, RDS, MSK, S3, CloudWatch) and Kubernetes (Helm, RBAC, ServiceAccounts)
Fluency in Python and IaC with AWS CDK, CDK8s or equivalent
Prometheus, Grafana, alert tuning, incident-driven monitoring improvements
Experience improving existing systems for operational excellence and reliability

Nice to have

Experience supporting Spark on Kubernetes, Argo, or Kafka-based batch pipelines

Culture & Benefits

100% remote work with autonomy focused on results
Highly competitive USD pay
Paid time off for well-being
Work with top U.S. companies on high-impact projects
Diverse global network across 25+ countries with emphasis on work-life balance and engagement activities

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →