Senior Site Reliability Engineer

155 000 - 196 000$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer: Improving and protecting the reliability, performance, and operability of AWS-based production systems for commercial contractors with an accent on modern SRE practices, observability, and infrastructure as code. Focus on designing reliable services, contributing to automation, and participating in incident response for an industry-defining company.

Location: Hybrid schedule required, Tuesday–Thursday in-office in Los Angeles, California. While applicants are welcome from across the U.S. where hirify.global is registered to do business, this specific role has a hybrid requirement in Los Angeles. Currently excludes AL, AK, CT, HI, KY, MS, NE, NM, ND, RI, SD, WV, WY for remote work within the U.S.

Salary: $155,000 - $196,000 base salary range + annual bonus

Company

hirify.global is building a software platform that empowers today’s commercial contractors, transforming the multi-billion dollar commercial contracting industry with AI-driven tools.

What you will do

Drive and refine modern SRE practices across services, including SLIs/SLOs and reliability reviews.
Design and maintain end-to-end observability (metrics, logs, traces, dashboards, and alerts).
Partner with product and engineering teams to design reliable services, reviewing architectures and rollout strategies.
Help evolve and operate AWS infrastructure (networking, compute, data stores) using Infrastructure as Code (Terraform).
Contribute code to services, tooling, and automation (reliability libraries, deployment tools, health checks).
Participate in incident response for infrastructure-related production issues, including post-incident reviews.

Requirements

5+ years of professional experience in Site Reliability Engineering, DevOps, Infrastructure Engineering, or production-focused Software Engineering.
Proven experience leading multi-sprint, multi-engineer projects to successful completion.
Thorough understanding of modern SRE practices (SLIs/SLOs, error budgets, toil reduction, safe deployment).
Software engineering experience, comfortable working in Python or Node.js/TypeScript.
Strong interest and experience with using LLMs and AI-assisted tooling in your workflow.
Strong observability skills, including designing metrics, logging, tracing, and building actionable dashboards and alerts with tools like Datadog, Prometheus, or Grafana.
Experience with AWS in production, Terraform-based Infrastructure as Code, and container/orchestration platforms (Docker with ECS, EKS, or Kubernetes).
Ability and willingness to participate in a production on-call rotation.

Nice to have

Incident management experience, including participating in or coordinating incident response and working with tools like incident.io, PagerDuty, or Opsgenie.

Culture & Benefits

Generous equity grant, with opportunities to become an owner in the company.
Macbook computer provided for work.
Comprehensive benefits package and flexible PTO.
Hybrid work schedules and a work-from-home stipend.
Company events like BBQs and team-building activities, both in-person and virtual.
Fast-paced, collaborative, and dynamic work environment with opportunities for growth and career advancement.

Senior Site Reliability Engineer

Описание вакансии

TL;DR

Company

What you will do

Requirements

Nice to have

Culture & Benefits

Похожие вакансии

Senior DevOps Engineer (AI)

Staff DevOps Engineer (AI)

Senior Site Reliability Engineer

Senior DevOps Engineer

Senior Site Reliability Engineer (AI)

Sr. DevOps Engineer (ERP)