Назад
Company hidden
2 дня назад

Site Reliability Engineer (AI)

100 000 - 170 000$
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (AI): Building and improving automation, tooling, and infrastructure for GPU cloud AI workloads with an accent on system stability, scalability, and operational efficiency. Focus on troubleshooting performance issues in data center environments and maintaining SLOs/SLIs.

Location: US

Salary: $100,000 - $170,000 USD

Company

hirify.global is a GPU cloud engineered for AI, providing high-performance, cost-efficient infrastructure for AI-native startups and global enterprises.

What you will do

  • Build and improve automation and tooling to support AI workloads.
  • Support the development of operational systems and platform services.
  • Define and maintain basic SLOs/SLIs and monitoring dashboards.
  • Participate in incident response, troubleshooting, and post-incident reviews.
  • Collaborate with Engineering, Networking, and Infrastructure teams to improve system stability.
  • Contribute to the overall availability, scalability, and operational efficiency of the platform.

Requirements

  • 2–5 years of experience in SRE, Systems Engineering, or Software Engineering within a data center environment.
  • 2+ years of programming skills in Python, Go, or similar languages.
  • Working knowledge of Linux systems, networking concepts, and distributed systems.
  • Experience troubleshooting system or application issues in production environments.
  • Familiarity with observability tools including logs, metrics, and dashboards.
  • Must be based in the US.

Nice to have

  • Exposure to Kubernetes, cloud platforms, or virtualized/bare-metal environments.
  • Experience in AI, GPU workloads, or high-performance computing (HPC).
  • Understanding of high-performance networking concepts such as InfiniBand or RDMA.
  • Experience with production monitoring and alerting systems.

Culture & Benefits

  • Competitive compensation package including base salary and equity with annual reviews.
  • Dynamic progression plan tailored to individual ambitions and impact.
  • Flexible workplace policy focused on autonomy and trust.
  • Comprehensive benefits package including medical, dental, vision, and retirement plan participation.
  • Flexible paid time off and parental leave.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →