Назад
Company hidden
5 дней назад

Site Reliability Engineer (AI)

Формат работы
onsite
Тип работы
fulltime
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (AI): Own production reliability and platform engineering for user-facing AI products Devin and hirify.global with an accent on SLOs, incident response, and CI/CD pipelines. Focus on building monitoring and observability systems, automating toil reduction, and ensuring infrastructure scales with hundreds of thousands of daily users.

Location: On-site in San Francisco Bay Area

Company

Applied AI lab building end-to-end software agents like Devin, the first AI software engineer, and hirify.global, an AI-native IDE.

What you will do

  • Define and own SLOs, SLIs, error budgets, monitoring, alerting, and observability for Devin and hirify.global.
  • Lead incident response, run blameless postmortems, and build runbooks and tooling for sustainable on-call.
  • Own deployment pipelines, release infrastructure, CI/CD, and internal developer tooling to enable fast shipping.
  • Manage cloud infrastructure as code with reproducible, version-controlled environments.
  • Perform capacity planning, performance profiling, and growth modeling.
  • Integrate security into reliability practices and foster reliability culture across product and engineering teams.

Requirements

  • Deep experience running production systems at scale: SLOs, error budgets, on-call rotations, incident command.
  • Strong software engineering fundamentals; write real code.
  • Proficiency with cloud infrastructure (AWS, GCP, or Azure), Kubernetes, Terraform or equivalent IaC.
  • Experience building and owning CI/CD pipelines and deployment infrastructure.
  • Strong observability skills: instrumentation, dashboards, effective alerting.
  • Track record of systematic toil reduction through automation.
  • Comfort owning incidents end-to-end and product empathy for user-facing reliability.

Nice to have

  • Experience with developer-facing products or platforms.

Culture & Benefits

  • Small, talent-dense team of competitive programmers, founders, and AI researchers from Scale AI, Palantir, Cursor, Google DeepMind.
  • High ownership and trust: set your own reliability standards.
  • Proactive, systematic environment treating reliability as a craft.
  • Ship products used by hundreds of thousands of developers daily.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →