Director, Platform SRO
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Director, Platform SRO (SRE/Platform Reliability): Lead stability, resilience, and operational readiness for mission-critical broadcast linear, live event, and digital media platforms with an accent on high-severity incident command, real-time troubleshooting across media workflows, and SRO/SRE-driven reliability improvements. Focus on root cause analysis, observability/monitoring strategy, and disaster recovery/failover readiness to reduce operational risk in always-on live environments.
Company
is an industry media and entertainment business operating trusted brands across news, sports, and genre entertainment.
What you will do
- Coordinate high-severity incident response for broadcast linear channels, live events, and digital media platforms, serving as incident commander when required.
- Triage and troubleshoot issues across media workflows (playout, live production, contribution/distribution, and OTT delivery).
- Establish and execute incident management processes (escalation models, on-call coordination, communications, severity classification).
- Run post-incident reviews, root cause analyses, and corrective action plans to prevent recurrence.
- Improve reliability and operational maturity using SRO/SRE principles across on-prem, hybrid, and cloud media architectures.
- Define monitoring, alerting, and observability strategies; support disaster recovery, failover planning, and live-event readiness testing.
Requirements
- Experience supporting media, broadcast, streaming, digital publishing, or other 24x7 customer-facing platforms.
- Experience building or scaling SRE organizations and operational maturity programs; hands-on observability tooling experience (Datadog, New Relic, Splunk, Grafana, or similar).
- Familiarity with Infrastructure as Code and automation frameworks (Terraform, CloudFormation, or equivalent).
- Experience leading reliability initiatives across hybrid cloud and on-premises environments.
- Industry certifications such as AWS Solutions Architect, Google Professional Cloud Engineer, Azure Solutions Architect, ITIL, SRE Foundation, or equivalent.
- Experience implementing AI-assisted operational intelligence, event correlation, or automated incident response capabilities.
Culture & Benefits
- Full-time role with a good-faith annual compensation range of USD 180,000–210,000.
- Compensation may include additional benefits such as health insurance, retirement plans, and paid time off.
- In-person interview may be required with a Media employee at one of the company locations during the selection process.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →