Senior Staff SRE (AI)

207 000 - 261 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

principal

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Staff SRE (AI): Designing, operating, and evolving cloud infrastructure and operational platforms that power mission-critical SaaS and IoT services with an accent on observability, intelligent automation, and AIOps capabilities. Focus on defining the technical vision for operational intelligence, leading large-scale automation initiatives, and embedding reliability into system design.

Location: Remote (USA) or Onsite (San Diego, CA)

Salary: $207,000.00 – $261,000.00

Company

hirify.global is a company focused on designing, operating, and evolving cloud infrastructure and operational platforms for mission-critical SaaS and IoT services at a global scale.

What you will do

Define and drive long-term strategy for observability, operational intelligence, and reliability engineering across the organization.
Lead the evolution towards intelligent operations by designing AIOps capabilities such as anomaly detection, event correlation, and automated remediation.
Architect and lead the end-to-end observability platform across metrics, logs, traces, and events.
Drive large-scale automation initiatives, including self-service infrastructure workflows, policy-as-code guardrails, and automated response.
Partner with product, platform, and data teams to embed reliability, performance, cost efficiency, and fault tolerance into system design.
Provide technical leadership during high-severity incidents and guide blameless postmortems.

Requirements

8–10+ years of experience in SRE, platform engineering, or cloud infrastructure roles supporting large-scale production environments.
Demonstrated experience leading architecture, reliability strategy, or operational platforms across multiple teams.
Deep expertise designing and operating large-scale AWS environments, including services like VPC, EC2, EKS/ECS, RDS/DynamoDB, and S3.
Senior-level experience with observability platforms (New Relic, Datadog, Prometheus/Grafana, OpenTelemetry).
Expert-level experience with Infrastructure-as-Code using Terraform and/or CloudFormation, including GitOps workflows.
Strong scripting or programming skills (Python, Go, Bash) and expert understanding of Linux systems, networking, and Kubernetes.

Nice to have

Experience implementing or evaluating AIOps capabilities such as anomaly detection or predictive alerting.
Familiarity with applying machine learning or AI techniques to operational data or reliability workflows.

Culture & Benefits

Comprehensive medical, dental, and vision insurance, with Health Savings Account and Flexible Spending Accounts.
401(k) and 401(k) match.
Flexible Time Off (FTO) or Paid Time Off (PTO), plus 11 paid holidays and 1 inclusive holiday per year.
Employee Well-Being program and Education Reimbursement Program.
Commitment to building a diverse and inclusive workforce and providing reasonable accommodation for candidates with disabilities.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...