Назад
Company hidden
2 дня назад

Site Reliability Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (AI/DevOps): Shaping the reliability, scalability, and performance of the platform and customer-facing applications with an accent on infrastructure automation and high-availability ML workloads. Focus on building a cloud-agnostic platform and optimizing HPC clusters to ensure seamless model training and inference.

Location: Based in New York, NY (Hybrid: at least 3 days per week in office). Open to candidates who are open to relocating to the USA.

Company

hirify.global is a pioneering AI company democratizing high-performance, open-source, and cutting-edge models and solutions.

What you will do

  • Design and maintain scalable, fault-tolerant infrastructure for web services and ML workloads.
  • Manage production systems, troubleshooting issues and implementing monitoring and alerting systems.
  • Automate infrastructure deployment and orchestration using Kubernetes, Flux, and Terraform.
  • Collaborate with AI/ML researchers to enable reproducible model-training experiments.
  • Develop a cloud-agnostic platform as an abstraction layer between science and infrastructure.
  • Contribute to open-source projects, research publications, and technical documentation.

Requirements

  • Master’s degree in Computer Science, Engineering, or a related field.
  • 7+ years of experience in a DevOps/SRE role.
  • Strong experience with cloud computing, distributed systems, and reliability KPIs.
  • Hands-on proficiency with Docker, Kubernetes, Prometheus, Grafana, and ELK Stack.
  • Proficiency in Python, Go, or Bash and strong networking/security knowledge.
  • Must be based in or willing to relocate to NYC.

Nice to have

  • Experience in AI/ML environments.
  • Knowledge of High-Performance Computing (HPC) systems and Slurm.
  • Experience with AI-oriented solutions like Fluidstack, Coreweave, or Vast.

Culture & Benefits

  • Competitive salary and equity package.
  • Comprehensive medical, dental, and vision insurance for employees and families.
  • 401K with 6% matching.
  • 18 days of PTO and visa sponsorship.
  • Monthly stipends for meals ($400), gym membership ($120), and transportation.
  • Access to BetterUp coaching on a voluntary basis.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →