Staff SRE, Performance & Reliability

181 220 - 217 464$

Формат работы

hybrid

Тип работы

fulltime

Грейд

lead

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff SRE (Performance & Reliability): Driving the development of automation and observability tooling to improve operational efficiency and platform reliability for hirify.global's edge cloud platform, with an accent on traffic monitoring, alerting, and incident response. Focus on defining SLIs/SLOs, performing root-cause investigations, and ensuring resilience during demand shifts at a global scale.

Location: Hybrid in San Francisco, CA, New York, NY, or Denver, CO. Remote candidates within the US may be considered for exceptional qualifications.

Salary: $181,220–$217,464

Company

hirify.global is an edge cloud platform company enabling customers to create fast, secure, and reliable digital experiences for high-volume internet traffic.

What you will do

Drive the development of automation and observability tooling for operational efficiency and platform reliability.
Partner with observability teams to implement and improve dashboards (Grafana, Prometheus) and metrics pipelines.
Define and improve SLIs/SLOs and monitoring frameworks to proactively surface issues.
Leverage data pipelines (SQL, BigQuery) for trend analysis, capacity planning, and traffic pattern recognition.
Lead incident response, mitigation, and communication, ensuring proactive issue resolution.
Monitor seasonal patterns, major events, and global traffic distribution to maintain platform resilience.

Requirements

8+ years of experience in Site Reliability Engineering, Systems Engineering, or Platform/Infrastructure Engineering.
Professional experience operating in CDN, streaming media, or high-volume internet traffic environments.
Deep understanding of network/distributed/cloud systems: TCP/IP, DNS, HTTP/S, TLS, caching/proxy/CDN technologies.
Demonstrated ability to build automation, tooling, and observability systems (Prometheus, Grafana, BigQuery/SQL).
Hands-on experience with scripting or programming (Python, Go, Shell).
Strong communication skills and ability to coordinate complex technical work across multiple teams.
Must be eligible to work in the US.

Nice to have

Experience with large-scale data analytics systems (Spark, Presto).
Familiarity with cloud platforms (AWS, GCP, Azure), infrastructure as code, or container orchestration (Terraform, Kubernetes).
Background in media, live events, or streaming operations.

Culture & Benefits

Flexible vacation policy and up to 18 days of paid sick leave.
Comprehensive benefits package including medical, dental, vision, family planning, and mental health support.
401(k) with company match and an Employee Stock Purchase Program.
11 paid local holidays and 11 paid company wellness days.
In-person new hire orientation in San Francisco for team connection and learning.
Commitment to diversity, inclusion, and a supportive work environment.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →