Назад
Company hidden
8 часов назад

Senior Site Reliability Engineer (AI)

125 200 - 132 500CAD
Формат работы
remote (только Canada)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Canada
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer (AI): Building and optimizing the reliability and AI operations foundation for a semiconductor intelligence platform with an accent on LLM observability, agentic pipeline resilience, and multi-region AWS architecture. Focus on designing blast radius containment for AI agents, automating observability via Datadog, and scaling an Internal Developer Platform (IDP).

Location: Remote for candidates based in Canada

Salary: $125,200 - $132,500 CAD

Company

hirify.global is the leading information platform providing in-depth intelligence and reverse engineering analysis for the semiconductor industry.

What you will do

  • Own SLOs, SLIs, and error budgets for all production services, driving error budget discipline across engineering.
  • Design reliability patterns for AI agent pipelines, including LLM observability, tool-use tracking, and graceful degradation.
  • Architect blast radius containment through isolation and circuit breaking to bound customer impact from agent failures.
  • Lead incident response and post-incident reviews, maturing the Canada Central/West active-active architecture toward a 24-hour RTO.
  • Partner with AI Engineering on compute provisioning, model serving, inference latency, and workload isolation.
  • Manage infrastructure as code via Terraform and GitOps, and oversee FinOps visibility for AWS cost segments.

Requirements

  • Must be based in Canada
  • Bachelor's degree in Computer Science, Engineering, or equivalent experience.
  • 6–8 years of experience in SRE, platform engineering, or DevOps with demonstrated technical leadership.
  • Deep expertise in AWS (EKS, Lambda, CloudWatch) and multi-region architecture patterns.
  • Proficiency with Terraform, GitOps, and operational depth in Datadog.
  • Strong skills in Docker, Kubernetes, Python, and Bash; understanding of Java/Spring Boot microservices.

Nice to have

  • Experience designing reliability architecture for agentic AI systems and LLM-dependent services.
  • AWS Professional certifications (Solutions Architect or DevOps Engineer).
  • FinOps Certified Practitioner or cloud cost management experience at scale.
  • Experience in semiconductor, SaaS, or data-intensive platform environments.

Culture & Benefits

  • Comprehensive benefits package including health, dental, vision, and wellness.
  • Financial perks such as RRSP Matching and annual fitness reimbursement.
  • Flexible vacation policy and company-sponsored training and development.
  • Inclusive environment prioritizing diversity, equity, and accessibility.
  • High-growth environment focused on high performance and innovation.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →