Назад
Company hidden
3 дня назад

Senior Production Engineer (AI)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Production Engineer (AI): Taking direct, hands-on ownership of critical tooling for an AI cloud platform with an accent on driving reliability and delivery success, and strengthening operational foundations. Focus on building, debugging, and operating production systems that improve availability, scalability, and operational automation.

Location: This role is primarily based in a hybrid work environment in **Livingston, NJ, New York, NY, Sunnyvale, CA, or Bellevue, WA**. Remote work may be considered for candidates more than 30 miles from an office, based on specialized skill sets. To conform to U.S. Government export regulations, applicants **must be a U.S. person** (citizen, national, lawful permanent resident, refugee, or asylee) or eligible to access export-controlled information without authorization.

Company

hirify.global is The Essential Cloud for AI™, delivering a cutting-edge cloud platform that powers the next wave of AI and became a publicly traded company in March 2025.

What you will do

  • Take hands-on ownership of critical systems and frameworks, driving their architecture, implementation, and long-term evolution.
  • Lead end-to-end delivery of engineering projects to improve availability, scalability, and operational automation.
  • Build and maintain observability, alerting, automated remediation, and resilience testing for supported systems.
  • Participate in incident response, conduct deep root-cause investigations, and implement lasting fixes.
  • Improve runbooks, deployment workflows, and operational tooling to enhance production readiness.
  • Ship production code regularly in Python, Go, or similar languages, and participate in on-call rotations.

Requirements

  • 7+ years of engineering experience building and operating distributed systems or cloud platforms.
  • Demonstrated ability to debug complex production issues end-to-end across services, infrastructure layers, and automation.
  • Strong programming or scripting ability (Python, Go, or similar) with experience shipping and operating production services and tools.
  • Deep knowledge of cloud-native technologies and distributed system patterns, particularly Kubernetes.
  • Experience with modern observability stacks: metrics, tracing, structured logs, SLOs/SLIs, and incident lifecycle practices.
  • A track record of successfully delivering hands-on reliability improvements through engineering execution.

Nice to have

  • Experience building internal tooling, frameworks, or automation supporting high-availability cloud operations.
  • Familiarity with DR/BCP, service tiering, capacity planning, or chaos engineering.
  • Background operating or building large-scale AI or GPU-accelerated infrastructure.

Culture & Benefits

  • 100% employer-paid medical, dental, and vision coverage, plus life and disability insurance.
  • 401(k) with generous employer match and Employee Stock Purchase Program (ESPP).
  • Flexible PTO and comprehensive childcare support through Kinside.
  • Catered lunch daily (for office-based employees) and weekly massages (NY/NJ).
  • Dynamic, collaborative culture focused on innovation and learning.
  • Mental Wellness Benefits through Spring Health and Family-Forming support by Carrot.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →