Incident & Change Champion (AI Infrastructure)

Формат работы

remote (Global)

Тип работы

fulltime

Грейд

senior

Английский

Страна

UK/US/Norway +2 еще

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Incident & Change Champion (AI Infrastructure): Owning and optimizing Incident and Change Management processes for a GPU cloud platform with an accent on operational discipline and tooling implementation. Focus on reducing system downtime through disciplined major incident coordination, CAB leadership, and fostering a blameless postmortem culture.

Location: Remote (Global)

Company

hirify.global is a GPU cloud engineered for AI, providing cost-effective, high-performance infrastructure for AI start-ups and large enterprise customers.

What you will do

Develop and refine Incident and Change Management processes to v1.0, including severity declarations, SLA/SLO tables, and communication ladders.
Lead the migration and implementation of incident and change workflows within Jira Service Management.
Act as Incident Commander or Major Incident Manager for SEV-1 and complex SEV-2 events, coordinating internal and external communications.
Chair the Change Advisory Board (CAB) and manage the change calendar, including freeze windows for critical periods.
Train and certify a pool of Incident Commanders across Support and SRE teams and run quarterly tabletop exercises.
Define and report key operational metrics (MTTA, MTTR, change success rate) to the senior leadership team.

Requirements

5+ years in ITSM / Service Management roles with direct ownership of Incident and Change Management processes.
Hands-on experience facilitating major incidents end-to-end as an Incident Commander in a 24/7 production environment.
Demonstrable experience running a Change Advisory Board or equivalent change-review forum.
Proven track record configuring Jira Service Management, ServiceNow, or equivalent ITSM tooling.
Strong technical writing skills for process documents, postmortems, and executive reports.
Comfort holding the room under pressure with senior stakeholders, engineers, and customers concurrently.

Nice to have

Experience in cloud, hyperscaler, AI infrastructure, or HPC environments.
Familiarity with SRE concepts, including SLOs, error budgets, and runbook discipline.
Experience designing and running tabletop exercises and game days.
Experience operating processes for regulated or sovereign customer workloads.
Familiarity with Jira automation and JSM portals.

Culture & Benefits

Competitive compensation package including base salary and equity with annual reviews.
Remote-first work environment with high autonomy and human-first flexibility.
Opportunity to join a fast-growing tech startup pushing the boundaries of AI infrastructure.
Dynamic progression plan tailored to individual professional ambitions.
Collaborative and supportive environment focused on ownership, transparency, and accountability.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →