SRE (Site Reliability Engineering) (GCP)

150 000 - 275 000$

Формат работы

onsite

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

SRE (Site Reliability Engineering) (GCP): Own reliability across GCP infrastructure, lead incident response, and drive scalable database and search operations with an accent on Kubernetes, Terraform-managed environments, and production observability. Focus on preventing recurrence through end-to-end incident handling, scaling MySQL/Postgres, and tuning Elasticsearch clusters under real-world load.

Location: New York City (On-site)

Salary: $150K–$275K (Offers Equity)

Company

hirify.global builds a large-scale platform for gaming clips and creator-powered gaming communities.

What you will do

Own reliability across GCP infrastructure, including Kubernetes clusters, managed services, and data pipelines to improve availability and latency.
Lead incident response end-to-end: on-call rotations, runbooks, postmortems, and follow-through to prevent recurrence.
Architect and execute database scaling strategies (sharding, replication, query optimization, capacity planning) for MySQL and Postgres.
Manage and evolve Terraform-managed GCP and Kubernetes configurations.
Own Elasticsearch end-to-end: capacity planning, sharding strategy, index lifecycle management, upgrades, and performance tuning.
Build observability (metrics, dashboards, alerting, tracing) and improve CI/CD reliability across GitHub Actions; harden IAM, secrets management, and network segmentation.

Requirements

Hands-on experience scaling and sharding relational databases (MySQL/Postgres) in production.
Strong GCP experience with Kubernetes, VPC, IAM, Cloud Logging, and managed services.
Fluency in Terraform with ownership of infrastructure-as-code at scale.
Production experience operating Elasticsearch and keeping clusters healthy.
Proven incident response skills: calm execution during P0s, clear communication, and postmortems that prevent recurrence.
Excellent communication skills to flag issues quickly and lead/write actionable postmortems.

Culture & Benefits

Comprehensive medical, dental, and vision coverage.
401(k) and wellness/fitness perks (Wellhub membership) plus mental health resources.
Paid parental leave, fertility and maternal health benefits.
Generous PTO and daily meals and commuter benefits at the NYC HQ in Flatiron.
Learning and development stipend; competitive salary with meaningful equity.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →