Site Reliability Engineer (AI)

Формат работы

onsite

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Site Reliability Engineer (AI): Own production reliability and platform engineering for user-facing AI products Devin and hirify.global with an accent on SLOs, incident response, and CI/CD pipelines. Focus on building monitoring and observability systems, automating toil reduction, and ensuring infrastructure scales with hundreds of thousands of daily users.

Location: On-site in San Francisco Bay Area

Company

Applied AI lab building end-to-end software agents like Devin, the first AI software engineer, and hirify.global, an AI-native IDE.

What you will do

Define and own SLOs, SLIs, error budgets, monitoring, alerting, and observability for Devin and hirify.global.
Lead incident response, run blameless postmortems, and build runbooks and tooling for sustainable on-call.
Own deployment pipelines, release infrastructure, CI/CD, and internal developer tooling to enable fast shipping.
Manage cloud infrastructure as code with reproducible, version-controlled environments.
Perform capacity planning, performance profiling, and growth modeling.
Integrate security into reliability practices and foster reliability culture across product and engineering teams.

Requirements

Deep experience running production systems at scale: SLOs, error budgets, on-call rotations, incident command.
Strong software engineering fundamentals; write real code.
Proficiency with cloud infrastructure (AWS, GCP, or Azure), Kubernetes, Terraform or equivalent IaC.
Experience building and owning CI/CD pipelines and deployment infrastructure.
Strong observability skills: instrumentation, dashboards, effective alerting.
Track record of systematic toil reduction through automation.
Comfort owning incidents end-to-end and product empathy for user-facing reliability.

Nice to have

Experience with developer-facing products or platforms.

Culture & Benefits

Small, talent-dense team of competitive programmers, founders, and AI researchers from Scale AI, Palantir, Cursor, Google DeepMind.
High ownership and trust: set your own reliability standards.
Proactive, systematic environment treating reliability as a craft.
Ship products used by hundreds of thousands of developers daily.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →