Senior Site Reliability Engineer (AI)

125 200 - 132 500CAD

Формат работы

remote (только Canada)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Canada

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (AI): Building and optimizing the reliability and AI operations foundation for a semiconductor intelligence platform with an accent on LLM observability, agentic pipeline resilience, and multi-region AWS architecture. Focus on designing blast radius containment for AI agents, automating observability via Datadog, and scaling an Internal Developer Platform (IDP).

Location: Remote for candidates based in Canada

Salary: $125,200 - $132,500 CAD

Company

hirify.global is the leading information platform providing in-depth intelligence and reverse engineering analysis for the semiconductor industry.

What you will do

Own SLOs, SLIs, and error budgets for all production services, driving error budget discipline across engineering.
Design reliability patterns for AI agent pipelines, including LLM observability, tool-use tracking, and graceful degradation.
Architect blast radius containment through isolation and circuit breaking to bound customer impact from agent failures.
Lead incident response and post-incident reviews, maturing the Canada Central/West active-active architecture toward a 24-hour RTO.
Partner with AI Engineering on compute provisioning, model serving, inference latency, and workload isolation.
Manage infrastructure as code via Terraform and GitOps, and oversee FinOps visibility for AWS cost segments.

Requirements

Must be based in Canada
Bachelor's degree in Computer Science, Engineering, or equivalent experience.
6–8 years of experience in SRE, platform engineering, or DevOps with demonstrated technical leadership.
Deep expertise in AWS (EKS, Lambda, CloudWatch) and multi-region architecture patterns.
Proficiency with Terraform, GitOps, and operational depth in Datadog.
Strong skills in Docker, Kubernetes, Python, and Bash; understanding of Java/Spring Boot microservices.

Nice to have

Experience designing reliability architecture for agentic AI systems and LLM-dependent services.
AWS Professional certifications (Solutions Architect or DevOps Engineer).
FinOps Certified Practitioner or cloud cost management experience at scale.
Experience in semiconductor, SaaS, or data-intensive platform environments.

Culture & Benefits

Comprehensive benefits package including health, dental, vision, and wellness.
Financial perks such as RRSP Matching and annual fitness reimbursement.
Flexible vacation policy and company-sponsored training and development.
Inclusive environment prioritizing diversity, equity, and accessibility.
High-growth environment focused on high performance and innovation.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →