TL;DR
Senior Site Reliability Engineer: Architecting and implementing scalable AWS cloud solutions for a global hospitality platform with an accent on platform reliability, performance, and automation. Focus on maintaining highly loaded Kubernetes clusters, developing robust monitoring systems, and leading incident response efforts.
Location: Remote First, Remote Always (working with a global team across 40+ countries)
Company
hirify.global is a product company transforming hospitality with an intelligently designed platform that powers properties across 150 countries, processing billions in bookings annually.
What you will do
- Design and implement reliable, scalable, and efficient cloud infrastructure.
- Maintain and support highly loaded Kubernetes (EKS) clusters and infrastructure components.
- Develop and continuously improve monitoring and logging systems using Prometheus, DataDog, and Loki stacks.
- Participate in on-call rotation and lead incident response efforts, ensuring minimal service impact.
- Collaborate with development teams to establish Service Level Objectives (SLOs) and ensure systems meet reliability targets.
- Champion SRE best practices across engineering, mentoring teams on resiliency, performance optimization, and scalability.
Requirements
- 5+ years of hands-on experience as a SRE or Systems Engineer working extensively with AWS cloud infrastructure.
- 3+ years of production experience with Kubernetes, Docker, and Helm charts at scale.
- Proven track record implementing and scaling Elastic Kubernetes Service (EKS) platforms.
- Strong expertise with monitoring, logging, and alerting technologies (ELK, Datadog, Loki, or AWS CloudWatch).
- Experience with GitOps and ArgoCD.
- Working knowledge of web infrastructure including NGiNX, Ingress controllers, MySQL/PostgreSQL/Aurora, Redis/Memcached, and SQS.
Nice to have
- Advanced Database Administration experience with Aurora, MySQL, or PostgreSQL.
- Support and enhance the release process through CI/CD pipeline development and optimization.
- Experience working in PCI-compliant environments and security-focused infrastructure.
- Familiarity with Kong API Gateway and API management at scale.
Culture & Benefits
- Remote First, Remote Always work environment with a global team across 40+ countries.
- PTO in accordance with local labor requirements and Monthly Wellness Fridays.
- Full Paid Parental Leave and Home office stipend based on country of residency.
- Access to professional development courses in hirify.global University.
- 2 corporate apartment accommodations for team member use for free (San Diego & São Paulo).
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →