Director, Platform SRO
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Director, Platform SRO (SRE/Media Platforms): Lead stability, resilience, and operational readiness for mission-critical broadcast linear, live event, and digital media platforms with an accent on incident command, real-time troubleshooting, and reliability engineering. Focus on strengthening monitoring/observability, driving root cause analysis and corrective actions, and improving operational maturity across on-prem, hybrid, and cloud media architectures.
Location: 229 West 43rd Street, New York, New York
Salary: USD 180,000 - USD 210,000 yearly
Company
is an industry media and entertainment business operating trusted brands across political news, business/personal finance, golf, and sports/genre entertainment.
What you will do
- Coordinate high-severity incident response for broadcast linear channels, live events, and digital media platforms, serving as incident commander when required.
- Triage and troubleshoot issues across media workflows (playout, live production, contribution/distribution, and OTT delivery).
- Establish and run incident management processes, including escalation models, on-call coordination, communications, and severity classification.
- Conduct post-incident reviews, root cause analyses, and corrective action plans to reduce operational risk and prevent recurrence.
- Improve reliability and operational readiness across on-prem, hybrid, and cloud media architectures, including disaster recovery and failover planning.
- Develop runbooks/SOPs and mentor teams on incident response and reliability engineering best practices.
Requirements
- Experience supporting media, broadcast, streaming, digital publishing, or other 24x7 customer-facing platforms.
- Experience building or scaling SRE organizations and operational maturity programs; hands-on observability experience with tools such as Datadog, New Relic, Splunk, or Grafana.
- Familiarity with Infrastructure as Code and automation frameworks such as Terraform, CloudFormation, or equivalent.
- Experience leading reliability initiatives across hybrid cloud and on-premises environments.
- Relevant certifications (e.g., AWS Solutions Architect, Google Professional Cloud Engineer, Azure Solutions Architect, ITIL, SRE Foundation) or equivalent.
- Experience implementing AI-assisted operational intelligence, event correlation, or automated incident response capabilities.
Culture & Benefits
- In-person interview may be required for external candidates at a location prior to a hiring decision.
- Equal employment opportunity policy and support for reasonable accommodations during the application/recruitment process.
- Compensation includes a good-faith pay range; actual compensation may vary based on skills, qualifications, experience, and location, and may include benefits such as health insurance, retirement plans, and paid time off.
Hiring process
- Interview with a Media employee in person (for external candidates) prior to a hiring decision.
- Evaluation of fit based on experience with SRE/incident management, observability, reliability initiatives, and operational maturity.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →