Senior Site Reliability Engineer

Формат работы

remote (Global)

Тип работы

fulltime

Грейд

senior

Английский

Страна

UK, Spain

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer: Leading the design of scalable, fault-tolerant, and self-healing systems in a multi-region AWS environment with an accent on defining SLOs/SLIs and implementing long-term preventive measures. Focus on developing internal automation tools, deep observability, and proactively mitigating operational risks through chaos engineering.

Location: Remote (global, work-from-anywhere stipend)

Company

hirify.global is the world’s first eSIM store that helps people connect in over 200+ countries and regions across the globe, aiming to revolutionize the telecom industry.

What you will do

Lead the design of scalable, fault-tolerant, and self-healing systems in a multi-region AWS environment.
Define and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to drive architectural decisions and error budget policies.
Conduct blameless post-incident reviews to uncover systemic root causes and implement long-term preventive measures.
Develop internal tools and automation to permanently eliminate patterns of manual work.
Shift from simple monitoring to deep observability, ensuring high cardinality data leads to proactive actionable insights.
Proactively identify and mitigate operational risks through chaos engineering and architecture reviews.
Refine the on-call experience to reduce alert fatigue, improve MTTR, and ensure sustainable rotation health.

Requirements

Bachelor’s degree in Computer Engineering or a similar discipline.
5+ years of experience as a Site Reliability Engineer or in a similar role.
3+ years of experience with AWS services, including strong knowledge of container orchestration.
2+ years of Kubernetes experience.
Deep understanding of observability principles and tools like Prometheus, Datadog, or OpenTelemetry.
Experience with leading incident management and complex postmortem analysis.
Experience and interest in managing Infrastructure as Code (Terraform) and CI/CD tools such as GitHub Actions.
Proficiency in at least one programming language (Python, Go, Java, etc.) for building automation and internal tooling.
Event-driven architecture experience (SNS, SQS etc).
Good communication skills and fluency in English.
Participation in on-call rotation is a core expectation of this role, with no duties for the first 6 months.

Nice to have

Prior experience with Scrum and other agile methods.
Certification in relevant areas such as AWS Certified DevOps Engineer or Certified Kubernetes Administrator (CKA).
Prior experience with Telco Core Networks (e.g., 5G/LTE Packet Core, IMS, Signaling) and low-latency networking.
Experience with AI-driven SRE tools for anomaly detection and improvements.
Deep understanding of eSIM and GSMA related technologies and services.

Culture & Benefits

Remote-first environment with a work-from-anywhere stipend.
Health Insurance, annual wellness & learning credits.
Annual all-expenses-paid company retreat in a gorgeous destination.
Company values SRE principles, data-driven decisions, and automation.
Fosters a blameless culture where everyone is encouraged to learn from mistakes and share knowledge.
Paid on-call rotation with standby fees + overtime pay, guaranteed rest periods, and flexible hours following night incidents.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...

Senior Site Reliability Engineer

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Senior Software Engineer (Infrastructure)

Senior DevOps Engineer (AI)

Senior DevOps Engineer (AWS)

Senior Site Reliability Engineer (Public Cloud)

Senior Site Reliability Engineer (Azure)

Senior Devops Engineer