Назад
Company hidden
2 часа назад

Senior Site Reliability Engineer

155 000 - 196 000$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Site Reliability Engineer: Improving and protecting the reliability, performance, and operability of AWS-based production systems for commercial contractors with an accent on modern SRE practices, observability, and infrastructure as code. Focus on designing reliable services, contributing to automation, and participating in incident response for an industry-defining company.

Location: Hybrid schedule required, Tuesday–Thursday in-office in Los Angeles, California. While applicants are welcome from across the U.S. where hirify.global is registered to do business, this specific role has a hybrid requirement in Los Angeles. Currently excludes AL, AK, CT, HI, KY, MS, NE, NM, ND, RI, SD, WV, WY for remote work within the U.S.

Salary: $155,000 - $196,000 base salary range + annual bonus

Company

hirify.global is building a software platform that empowers today’s commercial contractors, transforming the multi-billion dollar commercial contracting industry with AI-driven tools.

What you will do

  • Drive and refine modern SRE practices across services, including SLIs/SLOs and reliability reviews.
  • Design and maintain end-to-end observability (metrics, logs, traces, dashboards, and alerts).
  • Partner with product and engineering teams to design reliable services, reviewing architectures and rollout strategies.
  • Help evolve and operate AWS infrastructure (networking, compute, data stores) using Infrastructure as Code (Terraform).
  • Contribute code to services, tooling, and automation (reliability libraries, deployment tools, health checks).
  • Participate in incident response for infrastructure-related production issues, including post-incident reviews.

Requirements

  • 5+ years of professional experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or production-focused Software Engineering.
  • Proven experience leading multi-sprint, multi-engineer projects to successful completion.
  • Thorough understanding of modern SRE practices (SLIs/SLOs, error budgets, toil reduction, safe deployment).
  • Software engineering experience, comfortable working in Python or Node.js/TypeScript.
  • Strong interest and experience with using LLMs and AI-assisted tooling in your workflow.
  • Strong observability skills, including designing metrics, logging, tracing, and building actionable dashboards and alerts with tools like Datadog, Prometheus, or Grafana.
  • Experience with AWS in production, Terraform-based Infrastructure as Code, and container/orchestration platforms (Docker with ECS, EKS, or Kubernetes).
  • Ability and willingness to participate in a production on-call rotation.

Nice to have

  • Incident management experience, including participating in or coordinating incident response and working with tools like incident.io, PagerDuty, or Opsgenie.

Culture & Benefits

  • Generous equity grant, with opportunities to become an owner in the company.
  • Macbook computer provided for work.
  • Comprehensive benefits package and flexible PTO.
  • Hybrid work schedules and a work-from-home stipend.
  • Company events like BBQs and team-building activities, both in-person and virtual.
  • Fast-paced, collaborative, and dynamic work environment with opportunities for growth and career advancement.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...