Назад
Company hidden
1 час назад

Senior Site Reliability Engineer

Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer: Building tools, automation, and observability for resilient high-scale systems supporting fan engagement platforms with an accent on metrics, alerting, and incident response. Focus on defining SLIs/SLOs, streamlining CI/CD pipelines, automating reliability checks, and driving operational excellence through blameless postmortems and capacity planning.

Location: Remote (US-based, US work authorization required). Hybrid/flexible work environment.

Company

Growth-stage company providing fan engagement platforms for high school sports, including ticketing, streaming, fundraising, and more, trusted by thousands of US schools.

What you will do

  • Assess and improve system visibility by reviewing dashboards, metrics, logs, and implementing targeted enhancements.
  • Refine monitoring, alerting, and dashboards for critical services to enable faster issue detection and response.
  • Integrate observability and telemetry into build, deploy, and release processes.
  • Define SLIs/SLOs for core user flows and align teams on reliability standards.
  • Streamline incident response, automate routine tasks, and participate in on-call rotations.
  • Partner with engineering teams to implement reliability best practices, release automation, and proactive incident prevention.

Requirements

  • Solid experience in Python for automation and operational tasks
  • Proficiency in at least one of Java, C++, or Go
  • Strong knowledge of Linux, cloud infrastructure (AWS, GCP, Azure), Docker, Kubernetes, Terraform
  • Experience with CI/CD pipelines, version control, automated testing, observability tools (Prometheus, Grafana, ELK, Datadog)
  • Proven experience with SLAs/SLOs, critical user journeys, incident facilitation, and cross-functional collaboration
  • Problem-solving mindset treating reliability as a shared responsibility

Nice to have

  • Experience with end-to-end/integration tests, performance testing, chaos engineering
  • Contributions to developer tooling or reliability frameworks
  • Exposure to security, compliance, change management
  • Relevant certifications

Culture & Benefits

  • Accountability, collaboration, growth, and fairness-focused culture
  • Multiple medical, dental, vision, life, and disability insurance plans
  • 401K with company match, company equity (stock options), Employee Emergency Fund
  • Open PTO policy
  • Must be full-time employee for health benefits eligibility

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →