Command Center Systems Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Command Center Systems Engineer (AI): Building and governing the operational backbone for global GPU clusters with an accent on SOPs, escalation matrices, and change management. Focus on optimizing the incident management lifecycle and ensuring 24/7 operational continuity in a hyper-growth cloud environment.
Location: Hybrid (Kenilworth, NJ). Remote may be considered for candidates located more than 30 miles from an office. Must be a U.S. person (citizen, green card holder, refugee, or asylee)
Salary: $109,000 – $145,000
Company
is a specialized cloud provider for AI, delivering high-performance infrastructure to AI labs, startups, and global enterprises.
What you will do
- Govern and maintain all SOPs, MOPs, and EOPs to ensure operational accuracy and consistency.
- Own the escalation framework for incident triage, cross-functional coordination, and leadership notification.
- Lead change management governance to ensure structured and safe infrastructure changes.
- Develop and manage shift structures, handover protocols, and 24/7 staffing frameworks.
- Oversee the incident management lifecycle, facilitating RCAs and tracking corrective actions.
- Define and track operational KPIs (MTTD, MTTR, uptime) and deliver performance reporting to leadership.
Requirements
- 5+ years of experience in data center operations or mission-critical 24/7 infrastructure.
- Proven track record of building and scaling operational frameworks, SOPs, and escalation matrices.
- Strong project and program management skills with cross-functional alignment capability.
- Excellent written and verbal communication for creating clear technical documentation.
- Experience facilitating root cause analysis (RCA) and driving closure on corrective actions.
- Must be a U.S. person as defined by U.S. Government export regulations
Nice to have
- Lean, Six Sigma, or other process improvement certifications.
- Experience in hyperscale, cloud, or AI infrastructure environments.
- Background in training program development or operational enablement.
- Familiarity with ITSM platforms like Jira or ServiceNow.
Culture & Benefits
- 100% company-paid medical, dental, and vision insurance.
- 401(k) with generous employer match and Employee Stock Purchase Program (ESPP).
- Flexible PTO and paid parental leave.
- Catered lunch provided daily at office and data center locations.
- Comprehensive wellness support through Spring Health and family-forming support via Carrot and Kinside.
- Casual work environment focused on innovative disruption and entrepreneurial thinking.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →