Назад
Company hidden
6 часов назад

Site Reliability Engineer (SRE)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
c1
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (SRE): Designing and implementing automated, highly available, and scalable cloud infrastructure for a data platform and AI workloads with an accent on observability, database performance, and incident response. Focus on optimizing machine learning model deployment, supporting high-traffic multi-tenant systems, and ensuring robust security and compliance.

Location: Remote (Global). Company hubs are in France, the UK, and the US.

Company

hirify.global is a startup empowering companies with technology to manage their climate impact and make a meaningful contribution to a better future.

What you will do

  • Design, implement, and maintain highly available, scalable, and secure cloud infrastructure for the hirify.global Data platform and AI workloads.
  • Improve and expand observability strategy using Datadog for Rails applications and AI workloads.
  • Develop scalable infrastructure to support machine learning model training, deployment, and monitoring.
  • Support critical infrastructure scaling projects and contribute to high-traffic systems design.
  • Manage day-to-day operations, including on-call duties, capacity planning, and proactive system health monitoring.
  • Implement security measures, data protection protocols, and ensure strong compliance (SOC 2 Type 2, ISO 27001).

Requirements

  • Engineering degree or 3+ years of DevOps/SRE experience, with strong candidates at 5+ years preferred.
  • Strong knowledge of AWS (ECS/Fargate), Docker, Terraform, and PostgreSQL at scale (sharding, clustering, high-volume scenarios preferred).
  • Datadog expertise strongly preferred.
  • Experience with continuous integration and continuous deployment, high-traffic multi-tenant systems, and database scaling strategies.
  • Strong operational mindset with experience in on-call rotations and production incident management.
  • Experience improving observability and monitoring systems.
  • English: C1+ required (fluent).

Nice to have

  • Ruby on Rails experience.
  • Snowflake experience.
  • Change Data Capture and data pipeline experience.
  • Familiarity with ARC (Actions Runner Controller) and Kubernetes.

Culture & Benefits

  • Opportunity to join an exciting startup with a vision to change the world by managing climate impact.
  • Flexible work model to balance personal and professional commitments.
  • Committed to fostering a connected and engaged remote work culture with global colleagues.
  • Part of a B Corporation dedicated to creating successful businesses that benefit society and the planet.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →