Назад
Company hidden
18 часов назад

Senior Staff SRE (AI)

207 000 - 261 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
principal
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Staff SRE (AI): Designing, operating, and evolving cloud infrastructure and operational platforms that power mission-critical SaaS and IoT services with an accent on observability, intelligent automation, and AIOps capabilities. Focus on defining the technical vision for operational intelligence, leading large-scale automation initiatives, and embedding reliability into system design.

Location: Remote (USA) or Onsite (San Diego, CA)

Salary: $207,000.00 – $261,000.00

Company

hirify.global is a company focused on designing, operating, and evolving cloud infrastructure and operational platforms for mission-critical SaaS and IoT services at a global scale.

What you will do

  • Define and drive long-term strategy for observability, operational intelligence, and reliability engineering across the organization.
  • Lead the evolution towards intelligent operations by designing AIOps capabilities such as anomaly detection, event correlation, and automated remediation.
  • Architect and lead the end-to-end observability platform across metrics, logs, traces, and events.
  • Drive large-scale automation initiatives, including self-service infrastructure workflows, policy-as-code guardrails, and automated response.
  • Partner with product, platform, and data teams to embed reliability, performance, cost efficiency, and fault tolerance into system design.
  • Provide technical leadership during high-severity incidents and guide blameless postmortems.

Requirements

  • 8–10+ years of experience in SRE, platform engineering, or cloud infrastructure roles supporting large-scale production environments.
  • Demonstrated experience leading architecture, reliability strategy, or operational platforms across multiple teams.
  • Deep expertise designing and operating large-scale AWS environments, including services like VPC, EC2, EKS/ECS, RDS/DynamoDB, and S3.
  • Senior-level experience with observability platforms (New Relic, Datadog, Prometheus/Grafana, OpenTelemetry).
  • Expert-level experience with Infrastructure-as-Code using Terraform and/or CloudFormation, including GitOps workflows.
  • Strong scripting or programming skills (Python, Go, Bash) and expert understanding of Linux systems, networking, and Kubernetes.

Nice to have

  • Experience implementing or evaluating AIOps capabilities such as anomaly detection or predictive alerting.
  • Familiarity with applying machine learning or AI techniques to operational data or reliability workflows.

Culture & Benefits

  • Comprehensive medical, dental, and vision insurance, with Health Savings Account and Flexible Spending Accounts.
  • 401(k) and 401(k) match.
  • Flexible Time Off (FTO) or Paid Time Off (PTO), plus 11 paid holidays and 1 inclusive holiday per year.
  • Employee Well-Being program and Education Reimbursement Program.
  • Commitment to building a diverse and inclusive workforce and providing reasonable accommodation for candidates with disabilities.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...