Назад
Company hidden
22 часа назад

Staff Platform Reliability Engineer (AI)

185 000 - 230 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Platform Reliability Engineer (SRE/Kubernetes): Building and optimizing the Tempest scale and reliability platform for AI-driven organizations with an accent on performance profiling and infrastructure automation. Focus on diagnosing system bottlenecks, improving observability through Prometheus and New Relic, and scaling the platform across multi-cloud environments.

Location: Remote (Must be based in the US)

Salary: $185,000 - $230,000 USD

Company

hirify.global builds software that helps the largest AI-driven organizations develop and operate advanced data science and AI solutions at scale.

What you will do

  • Serve as the technical owner of Tempest, ensuring the scale and reliability platform remains extensible and aligned with infrastructure needs.
  • Diagnose and resolve performance bottlenecks and resource misconfigurations in production Kubernetes environments.
  • Deliver data-driven sizing recommendations for customer-facing documentation based on empirical testing.
  • Enhance observability by improving Prometheus and New Relic instrumentation to pinpoint root causes.
  • Operationalize and enable scale testing across multiple cloud providers.
  • Build infrastructure automation to increase the operational efficiency of a small engineering team.

Requirements

  • Experience in SRE or platform engineering operating distributed systems in production Kubernetes.
  • Strong proficiency in Python for orchestration, infrastructure automation, and systems integration.
  • Expertise with observability stacks including Prometheus, Grafana, and New Relic.
  • Proven ability to profile services and identify resource bottlenecks to ship durable fixes.
  • Familiarity with performance and load testing tools such as Locust or k6.
  • Must be based in the United States.

Culture & Benefits

  • Comprehensive benefits package including medical, dental, and vision insurance.
  • Financial perks such as a 401(k) plan, equity, and company bonuses.
  • Wellness stipends to support employee health.
  • Remote-first, asynchronous work environment.
  • Culture of continuous improvement, teaching, and learning.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →