Senior Site Reliability Engineer (AI)

77 600 - 82 200GBP

Формат работы

remote (только Europe)

Тип работы

fulltime

Грейд

senior

Английский

Страна

UK/Poland

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (AWS/AI): Building and scaling the reliability and AI operations foundation for a semiconductor intelligence platform with an accent on LLM observability, blast radius containment, and automated recovery. Focus on designing reliability patterns for AI agent pipelines, maturing active-active architectures, and establishing a comprehensive Internal Developer Platform.

Location: Remote for candidates based in the United Kingdom

Salary: £77,600 – £82,200 GBP

Company

hirify.global is an information platform providing in-depth intelligence and reverse engineering analysis for the semiconductor industry.

What you will do

Own SLOs, SLIs, and error budgets for production services and drive error budget discipline across engineering.
Design reliability patterns for AI agent pipelines, including LLM observability, tool-use tracking, and failure detection.
Architect blast radius containment and mature Canada Central/West active-active architecture toward 24-hour RTO.
Lead CI/CD pipeline strategy using Bitbucket Pipelines and GitHub Actions to optimize deployment frequency.
Operate Datadog for service health and extend observability to AI workloads like token consumption and agent completion rates.
Mentor junior and intermediate SRE engineers and drive IDP adoption via Backstage or Atlassian Compass.

Requirements

6–8 years of progressive experience in SRE, platform engineering, or DevOps with technical leadership.
Deep expertise in AWS (EKS, Lambda, CloudWatch) and multi-region architecture patterns.
Proficiency with Terraform, GitOps, and policy-as-code (Sentinel, OPA/Rego).
Hands-on operational depth in Datadog, including dashboards, SLO tracking, and distributed tracing.
Strong containerization expertise with Docker and Kubernetes (EKS preferred).
Must be based in the United Kingdom.

Nice to have

Experience designing reliability architecture for agentic AI systems and LLM-dependent services.
AWS Professional certifications (Solutions Architect or DevOps Engineer).
FinOps Certified Practitioner or cloud cost management experience at scale.
Experience in semiconductor, SaaS, or data-intensive platform environments.

Culture & Benefits

Company-sponsored training and development opportunities.
Comprehensive benefits package including health, dental, vision, wellness, and retirement.
Flexible vacation policy and annual fitness reimbursement.
Inclusive environment prioritizing diversity, equity, and accessibility.
Community involvement opportunities through charitable alliances.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →