TL;DR
Senior Site Reliability Engineer: Strengthening infrastructure and enhancing the ability to deploy, monitor, and scale SaaS platform systems with an accent on cloud-native systems, Kubernetes, and infrastructure-as-code. Focus on defining SLOs, improving incident response, and embedding reliability principles into service design.
Location: Hybrid work model, with offices near the Bernabeu Stadium in Madrid.
Company
hirify.global is a leader in digital employee experience management software, providing unprecedented insight to IT leaders for proactive optimization of employee experience.
What you will do
- Implement and manage cloud-native AWS systems and automate operations.
- Operate and enhance Kubernetes clusters, deployment pipelines, and service meshes.
- Design, build, and maintain the infrastructure for a multi-tenant SaaS platform, focusing on reliability, security, and scalability.
- Define and maintain SLOs, SLAs, and error budgets, and proactively address availability and performance issues.
- Participate in a shared on-call rotation, acting as Incident Commander and refining incident response processes.
- Work closely with software engineers to embed observability, fault tolerance, and reliability principles into service design.
Requirements
- Minimum Bachelor’s degree in Computer Science or equivalent practical experience.
- 5+ years of experience as a Site Reliability Engineer or Platform Engineer.
- Strong hands-on experience with public cloud services (AWS) and supporting SaaS products.
- Proficiency in programming/scripting (Python, Go, Bash) and infrastructure-as-code (Terraform).
- Experience with Kubernetes, Docker, CI/CD pipelines, and monitoring solutions (Datadog).
- Comfortable with participating in a rotating on-call schedule and leading post-incident reviews.
- Deep understanding of Linux systems, networking, and common troubleshooting practices.
- Excellent written and verbal skills in English.
Nice to have
- Exposure to compliance standards such as SOC 2, ISO 27001, or HIPAA; FedRAMP experience.
- Experience with chaos engineering or resilience testing practices.
Culture & Benefits
- Permanent contract and a competitive compensation package.
- Private Health Insurance (Sanitas) and daily meal vouchers of 11 EUR.
- Hybrid work model and flexible hours, plus unlimited vacation, 23 holidays, and 3 volunteer days.
- Up to 25 EUR per month for a gym subscription and flexible compensation for childcare & public transportation.
- Reimbursement of up to 50% of the cost of English & Spanish classes.
- Regular company and team events.
- A relocation package for candidates coming from another country.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →