Site Reliability Engineering (SRE) Ops Team Lead

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

lead

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Site Reliability Engineering (SRE) Ops Team Lead: Lead operations, reliability, and stability of production systems in hybrid cloud and on-prem environments with an accent on incident response, observability, alerting, and automation. Focus on driving uptime, SLAs, cost optimization, capacity planning, and team leadership for high-availability SaaS platforms.

Location: United States (Remote). Restricted to US Persons only due to ITAR regulations.

Company

Global provider of mission-critical software solutions for various industries.

What you will do

Own day-to-day operations, support, and high-stakes incident response for always-on production systems.
Drive post-incident reviews, enforce runbooks, monitor SLIs/SLOs, and optimize on-call rotations.
Manage observability, telemetry, alerting with tools like Coralogix and FireHydrant, and build real-time dashboards.
Champion automation, GitOps practices, Terraform infrastructure, and rigorous change reviews.
Lead FinOps, capacity planning, cost optimization, and trade-offs for performance and reliability.
Mentor SRE team, escalate issues, collaborate cross-functionally, and manage workflows in Jira.

Requirements

US Person status required due to ITAR restrictions on technical data access.
Deep hands-on experience in production operations, SRE, DevOps, or Infrastructure in hybrid cloud/on-prem.
Expertise in incident management, on-call best practices, and operational processes.
Proficiency with GitOps, Terraform, and observability tools.
Strong communication for leading incidents and cross-team coordination.

Nice to have

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
5+ years in SRE, DevOps, Infrastructure, or Production Operations.
Cloud certifications (AWS, Azure, Google Cloud).
Experience in Agile/Scrum and Jira.
Background supporting high-availability SaaS platforms.

Culture & Benefits

Hands-on technical leadership in mission-critical systems with cutting-edge SRE and automation tech.
Collaboration with global engineering and product teams.
Competitive compensation and comprehensive benefits.
Exciting growth opportunities in a fast-paced environment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →