Senior Reliaibility Engineer - Technology
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Reliability Engineer (AWS/Kubernetes): Operating, observing, and improving reliability of distributed systems with an accent on observability, automated responses, and production behavior analysis. Focus on designing SLIs/SLOs, enhancing autoscaling/self-healing, and driving incident root cause analysis for high resilience.
Location: 100% Remote (Bogota office mentioned; team primarily based in Latin America)
Company
Leading nearshore staff augmentation provider headquartered in New York, partnering with U.S. companies and delivering tech solutions with 600+ professionals across Latin America.
What you will do
- Design and improve observability strategies including metrics, logs, traces, alerts, and dashboards across services.
- Analyze production system behavior, failure modes, bottlenecks, and reliability risks.
- Maintain AWS CDK/CDK8s constructs for observability, autoscaling, and safeguards; operate VPC, EKS, RDS, OpenSearch, MSK.
- Enhance Kubernetes addons like ingress, cert-manager, autoscalers, monitoring stacks.
- Define SLIs, SLOs, alerting; improve automated responses, self-healing, and runbooks.
- Collaborate on incident investigations, RCA, CI/CD for IaC, and apply SRE principles like error budgets.
Requirements
- 5+ years in SRE, Platform Engineering, or Infrastructure with production systems experience.
- Strong observability ops: metrics, logs, traces, dashboards, alerts for complex systems.
- Hands-on AWS (VPC, IAM, RDS, MSK, S3, CloudWatch), Kubernetes (Helm, RBAC, ServiceAccounts).
- Fluency in Python; IaC with AWS CDK, CDK8s or equivalent.
- Prometheus, Grafana, alert tuning, incident monitoring improvements.
- Experience improving existing systems for operational excellence using observability data.
Nice to have
- Supporting Spark on Kubernetes, Argo, or Kafka batch pipelines.
- Designing reusable infrastructure/observability patterns or platform tooling.
Culture & Benefits
- 100% remote work with autonomy focused on results.
- Competitive USD pay, paid time off for well-being.
- Work with top U.S. companies on high-impact projects.
- Culture emphasizing work-life balance, engagement activities, multicultural team across 25+ countries.
- Collaborate with senior experts in dynamic, diverse network.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →