Senior Site Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Site Reliability Engineer (Python/AWS): Building and optimizing high-availability infrastructure for a public safety AI platform with an accent on system resilience and observability. Focus on diagnosing root-cause failures in Kubernetes, optimizing high-throughput messaging systems, and implementing AI-driven reliability improvements.
Location: Remote (New York or Boston); must be able to collaborate in-person a few times per quarter.
Salary: $160,000 - $195,000
Company
is a leading public safety AI company that provides mission-critical intelligence to first responders and security teams to enable faster emergency response.
What you will do
- Own performance and reliability outcomes by optimizing connection pooling, database architecture, and traffic routing.
- Design for system resilience through safer deployment patterns, failover strategies, and redundancy.
- Implement deep observability using structured logging, metrics, and alerting to detect issues before escalation.
- Manage production incidents from initial signal to final resolution and root cause implementation.
- Work across infrastructure-as-code, container orchestration, and application code to drive stability.
Requirements
- 5+ years of professional engineering experience with deep expertise in Python.
- Hands-on experience with AWS (networking, managed databases, IAM, DNS routing).
- Production experience with Kubernetes (EKS, ECS, or Fargate).
- Strong understanding of distributed systems failure modes (resource exhaustion, replication lag).
- Experience with high-throughput messaging (RabbitMQ, Kafka, SNS/SQS) and Terraform.
- Must be based in or near New York or Boston for occasional in-person collaboration.
Nice to have
- Experience with on-call rotations for mission-critical production systems.
- Proficiency with Datadog (APM, alerting), Elasticsearch, or OpenSearch.
- Experience with ArgoCD and GitOps deployments.
- Experience modernizing legacy CI/CD pipelines (Jenkins, Concourse).
Culture & Benefits
- Opportunity to work on a mission-driven product that saves lives globally.
- Competitive salary and benefits package.
- Equity participation (stock options).
- Dynamic and flexible startup environment with a highly talented team.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →