Эта вакансия в архиве

Посмотреть похожие вакансии ↓
Company hidden
обновлено 2 месяца назад

Senior Site Reliability Engineer

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Spain
Релокация
Spain

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer: Maintaining and evolving AWS and Kubernetes infrastructure for Python-based AI services with an accent on platform reliability, developer experience, and infrastructure as code. Focus on migrating services to Kubernetes, improving CI/CD pipelines, and owning observability efforts.

Location: This is a hybrid role based in Barcelona, Spain. Relocation support is provided for you and your family.

Company

Manychat is building a leading Chat Marketing platform used by over 1.5 million customers worldwide, focusing on Instagram, Messenger, WhatsApp, and TikTok automations.

What you will do

  • Maintain and harden AWS infrastructure (EC2, ALB/NLB, WAF, IAM, CloudWatch)
  • Operate and evolve EKS clusters powering Python-based AI services
  • Migrate existing services to Kubernetes using Terraform and Helm
  • Codify infrastructure with Terraform and manage host-level automation via Ansible
  • Build and improve CI/CD pipelines with GitHub Actions
  • Own observability efforts: Prometheus, Grafana, alerting, and on-call readiness
  • Support OS-level patching, certs, WAF rules, and general infra hygiene
  • Partner with engineers to guide best practices and drive platform reliability
  • Create clean, maintainable infrastructure documentation and playbooks
  • Occasionally support rare off-hours incidents

Requirements

  • 5+ years of experience managing Linux in production (Ubuntu, Amazon Linux)
  • Strong experience with Kubernetes (ideally EKS), Helm, and Terraform
  • Comfort with running and debugging Python workloads in containers
  • Solid understanding of networking, IAM, and cloud security best practices
  • Hands-on Nginx experience (Ingress and reverse proxy setups)
  • Excellent communication skills to explain complex infrastructure to developers clearly

Nice to have

  • Strong Ansible skills beyond the basics
  • PostgreSQL or Amazon RDS tuning and operations experience
  • Deep understanding of observability tools (Prometheus, Grafana, Loki, etc.)
  • Familiarity with PHP production environments
  • Experience with TDD, CI/CD best practices, and agile development
  • Any previous SRE-like exposure such as building resilience, automation, or incident tooling

Culture & Benefits

  • Hybrid onboarding to start work remotely, with relocation support for you and your family
  • Comprehensive health insurance for both you and your family
  • Professional development budget for conference tickets, online courses, and other relevant resources
  • Flexible benefits package to tailor perks that matter most for you
  • Hybrid work and generous leave options to prioritize work-life balance
  • In-office perks, including free meals and snacks
  • Company-funded sport activities, annual offsites, and team-building events