Назад
Company hidden
2 дня назад

Senior Site Reliability Engineer (AI)

77 600 - 82 200GBP
Формат работы
remote (только Europe)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK/Poland
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (AWS/AI): Building and scaling the reliability and AI operations foundation for a semiconductor intelligence platform with an accent on LLM observability, blast radius containment, and automated recovery. Focus on designing reliability patterns for AI agent pipelines, maturing active-active architectures, and establishing a comprehensive Internal Developer Platform.

Location: Remote for candidates based in the United Kingdom

Salary: £77,600 – £82,200 GBP

Company

hirify.global is an information platform providing in-depth intelligence and reverse engineering analysis for the semiconductor industry.

What you will do

  • Own SLOs, SLIs, and error budgets for production services and drive error budget discipline across engineering.
  • Design reliability patterns for AI agent pipelines, including LLM observability, tool-use tracking, and failure detection.
  • Architect blast radius containment and mature Canada Central/West active-active architecture toward 24-hour RTO.
  • Lead CI/CD pipeline strategy using Bitbucket Pipelines and GitHub Actions to optimize deployment frequency.
  • Operate Datadog for service health and extend observability to AI workloads like token consumption and agent completion rates.
  • Mentor junior and intermediate SRE engineers and drive IDP adoption via Backstage or Atlassian Compass.

Requirements

  • 6–8 years of progressive experience in SRE, platform engineering, or DevOps with technical leadership.
  • Deep expertise in AWS (EKS, Lambda, CloudWatch) and multi-region architecture patterns.
  • Proficiency with Terraform, GitOps, and policy-as-code (Sentinel, OPA/Rego).
  • Hands-on operational depth in Datadog, including dashboards, SLO tracking, and distributed tracing.
  • Strong containerization expertise with Docker and Kubernetes (EKS preferred).
  • Must be based in the United Kingdom.

Nice to have

  • Experience designing reliability architecture for agentic AI systems and LLM-dependent services.
  • AWS Professional certifications (Solutions Architect or DevOps Engineer).
  • FinOps Certified Practitioner or cloud cost management experience at scale.
  • Experience in semiconductor, SaaS, or data-intensive platform environments.

Culture & Benefits

  • Company-sponsored training and development opportunities.
  • Comprehensive benefits package including health, dental, vision, wellness, and retirement.
  • Flexible vacation policy and annual fitness reimbursement.
  • Inclusive environment prioritizing diversity, equity, and accessibility.
  • Community involvement opportunities through charitable alliances.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →