Manager Site Reliability Engineering

96 000 - 160 000$

Формат работы

onsite

Тип работы

fulltime

Английский

Страна

Описание вакансии

Текст:

TL;DR

Manager, Site Reliability Engineering (AWS/Video Streaming): Leading platform stability, scalability, and security for a digital sports streaming application with an accent on maintaining AWS infrastructure reliability, enhancing observability, and automation. Focus on triaging video playback issues, guiding cloud architecture, and reducing mean time to recovery.

Location: New York City, NY (Onsite)

Salary: $96,000–$160,000 USD

Company

hirify.global is a premier live entertainment and media company, known for its next-generation entertainment medium, Sphere, and regional sports networks MSG Network and MSG Sportsnet.

What you will do

Own platform reliability, performance, and security for live and on-demand video streaming infrastructure.
Lead and mentor a small technical team (SRE, VideoOps) and act as a hands-on contributor.
Design and maintain robust monitoring, logging, and alerting systems using tools like CloudWatch, Datadog, and Conviva.
Define and enforce operational best practices including disaster recovery, redundancy, and failover strategies.
Investigate and resolve complex issues across the application stack, from infrastructure to video playback.
Lead incident response efforts and participate in an on-call rotation during peak traffic events.
Collaborate with Product and Engineering teams to guide architectural decisions prioritizing resilience and scalability.
Implement and continuously strengthen platform security, including identity management, IAM policies, and AWS-level hardening.

Requirements

5+ years of experience in SRE, DevOps, or platform infrastructure roles, with 2+ years in a team lead or manager capacity.
Experience operating and scaling production environments in AWS, including services like CloudFront, Lambda, S3, API Gateway, and CloudWatch.
AWS Certification (Solutions Architect, DevOps Engineer, or similar) or equivalent deep hands-on experience.
Strong background in system observability, with experience using tools like Conviva, CloudWatch, and Datadog.
Deep understanding of video streaming architecture including HLS/DASH, CDNs, DRM, SSAI, and multi-platform delivery.
Expertise in scripting and automation using Python, Bash, or similar, with infrastructure-as-code tools like Terraform or CloudFormation.
Proven ability to lead platform security initiatives, including IAM policy management.
Experience collaborating with engineering teams to improve CI/CD pipelines and automate infrastructure changes.
Strong analytical and troubleshooting skills across application, network, and video delivery layers.
Participation in an after-hours on-call rotation is expected, particularly during live sporting events and high-traffic periods (typically evenings EST).

Culture & Benefits

Opportunity for career growth and longevity through robust tools and resources for employee upskilling.
Commitment to diversity and equal employment opportunities for all backgrounds.
Compliance with non-discrimination laws and consideration of requests for reasonable accommodations.