Назад
Company hidden
1 день назад

Site Reliability Engineering (SRE) Ops Team Lead

Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineering (SRE) Ops Team Lead: Lead operations, reliability, and stability of production systems in hybrid cloud and on-prem environments with an accent on incident response, observability, alerting, and automation. Focus on driving uptime, SLAs, cost optimization, capacity planning, and team leadership for high-availability SaaS platforms.

Location: United States (Remote). Restricted to US Persons only due to ITAR regulations.

Company

Global provider of mission-critical software solutions for various industries.

What you will do

  • Own day-to-day operations, support, and high-stakes incident response for always-on production systems.
  • Drive post-incident reviews, enforce runbooks, monitor SLIs/SLOs, and optimize on-call rotations.
  • Manage observability, telemetry, alerting with tools like Coralogix and FireHydrant, and build real-time dashboards.
  • Champion automation, GitOps practices, Terraform infrastructure, and rigorous change reviews.
  • Lead FinOps, capacity planning, cost optimization, and trade-offs for performance and reliability.
  • Mentor SRE team, escalate issues, collaborate cross-functionally, and manage workflows in Jira.

Requirements

  • US Person status required due to ITAR restrictions on technical data access.
  • Deep hands-on experience in production operations, SRE, DevOps, or Infrastructure in hybrid cloud/on-prem.
  • Expertise in incident management, on-call best practices, and operational processes.
  • Proficiency with GitOps, Terraform, and observability tools.
  • Strong communication for leading incidents and cross-team coordination.

Nice to have

  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
  • 5+ years in SRE, DevOps, Infrastructure, or Production Operations.
  • Cloud certifications (AWS, Azure, Google Cloud).
  • Experience in Agile/Scrum and Jira.
  • Background supporting high-availability SaaS platforms.

Culture & Benefits

  • Hands-on technical leadership in mission-critical systems with cutting-edge SRE and automation tech.
  • Collaboration with global engineering and product teams.
  • Competitive compensation and comprehensive benefits.
  • Exciting growth opportunities in a fast-paced environment.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →