Senior Reliability Engineer (AWS/Kubernetes)

Формат работы

remote (Global)

Тип работы

fulltime

Грейд

senior

Английский

Страна

US/DR

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Reliability Engineer (AWS/Kubernetes): Operating, observing, and improving reliability of distributed systems on AWS and Kubernetes with an accent on observability, operational maturity, and automated responses to system behavior. Focus on designing observability strategies, defining SLIs/SLOs, enhancing autoscaling/self-healing mechanisms, and performing root cause analysis for production incidents.

100% Remote

Company

Leading nearshore staff augmentation provider headquartered in New York with 600+ professionals based in Latin America, partnering with U.S. companies on digital transformation projects.

What you will do

Design and improve observability strategies including metrics, logs, traces, alerts, and dashboards.
Analyze system behavior in production to identify failure modes, bottlenecks, and risks.
Maintain AWS CDK/CDK8s constructs and core platform components like VPC, EKS, RDS, OpenSearch, MSK.
Operate Kubernetes addons such as ingress controllers, cert-manager, autoscalers, monitoring stacks.
Define SLIs, SLOs, alerting strategies, and automate operational responses including self-healing.
Collaborate on incident investigations, root cause analysis, and CI/CD for IaC/observability.

Requirements

5+ years in SRE, Platform Engineering, or Infrastructure with production systems experience.
Strong observability operations: metrics, logs, traces, dashboards, alerts for complex systems.
Hands-on with AWS (VPC, IAM, RDS, MSK, S3, CloudWatch) and Kubernetes (Helm, RBAC, ServiceAccounts).
Fluency in Python and IaC with AWS CDK, CDK8s or equivalent.
Prometheus, Grafana, alert tuning, incident-driven monitoring improvements.
Experience improving existing systems for operational excellence and reliability.

Nice to have

Experience with Spark on Kubernetes, Argo, or Kafka-based batch pipelines.

Culture & Benefits

100% remote work with autonomy focused on results.
Competitive USD compensation.
Paid time off for well-being.
Work with top U.S. companies on high-impact projects.
Diverse multicultural team across 25+ countries emphasizing work-life balance.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →