Senior Site Reliability Engineer (AI)

160 000 - 195 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (Python/AWS): Building and optimizing high-availability infrastructure for a public safety AI platform with an accent on system resilience and observability. Focus on diagnosing root-cause failures in Kubernetes, optimizing high-throughput messaging systems, and implementing AI-driven reliability improvements.

Location: Remote (New York or Boston); must be able to collaborate in-person a few times per quarter.

Salary: $160,000 - $195,000

Company

hirify.global is a leading public safety AI company that provides mission-critical intelligence to first responders and security teams to enable faster emergency response.

What you will do

Own performance and reliability outcomes by optimizing connection pooling, database architecture, and traffic routing.
Design for system resilience through safer deployment patterns, failover strategies, and redundancy.
Implement deep observability using structured logging, metrics, and alerting to detect issues before escalation.
Manage production incidents from initial signal to final resolution and root cause implementation.
Work across infrastructure-as-code, container orchestration, and application code to drive stability.

Requirements

5+ years of professional engineering experience with deep expertise in Python.
Hands-on experience with AWS (networking, managed databases, IAM, DNS routing).
Production experience with Kubernetes (EKS, ECS, or Fargate).
Strong understanding of distributed systems failure modes (resource exhaustion, replication lag).
Experience with high-throughput messaging (RabbitMQ, Kafka, SNS/SQS) and Terraform.
Must be based in or near New York or Boston for occasional in-person collaboration.

Nice to have

Experience with on-call rotations for mission-critical production systems.
Proficiency with Datadog (APM, alerting), Elasticsearch, or OpenSearch.
Experience with ArgoCD and GitOps deployments.
Experience modernizing legacy CI/CD pipelines (Jenkins, Concourse).

Culture & Benefits

Opportunity to work on a mission-driven product that saves lives globally.
Competitive salary and benefits package.
Equity participation (stock options).
Dynamic and flexible startup environment with a highly talented team.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →