Command Center Operations & Governance Specialist (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Command Center Operations & Governance Specialist (AI): Building and evolving the operational backbone for high-intensity GPU cluster infrastructure with an accent on process frameworks, standards, and operational discipline. Focus on designing escalation matrices, managing change governance, and ensuring 24/7 operational continuity at scale.
Location: Hybrid in Kenilworth, NJ. Remote work may be considered for candidates located more than 30 miles from an office. Must be a U.S. person (citizen, lawful permanent resident, refugee, or asylee) to comply with U.S. Government export regulations.
Salary: $109,000 – $145,000
Company
is the essential cloud for AI, delivering a platform of technology and tools that enables innovators to build and scale AI breakthroughs.
What you will do
- Govern and maintain all SOPs, MOPs, and EOPs across the Command Center to ensure accuracy and consistency.
- Own the escalation framework, defining paths for incident triage, cross-functional coordination, and leadership notification.
- Lead change management governance to ensure infrastructure changes follow safe, structured processes.
- Develop shift structures, handover protocols, and staffing frameworks for 24/7 operational continuity.
- Manage the incident management lifecycle, including response coordination, RCA facilitation, and corrective action tracking.
- Define and track operational KPIs such as MTTD, MTTR, and uptime, delivering performance reports to leadership.
Requirements
- 5+ years of experience in data center operations or mission-critical infrastructure in a 24/7 environment.
- Proven track record of building and scaling operational frameworks, SOPs, and escalation matrices.
- Strong project and program management skills with the ability to drive cross-functional alignment.
- Excellent written and verbal communication for translating complex requirements into documentation.
- Experience facilitating root cause analysis (RCA) and driving corrective actions to closure.
- Must be a U.S. person for export control compliance.
Nice to have
- Lean, Six Sigma, or other process improvement certifications.
- Experience in hyperscale, cloud, or AI infrastructure environments.
- Background in training program development or operational enablement.
- Familiarity with ITSM or ticketing platforms like Jira or ServiceNow.
Culture & Benefits
- 100% company-paid medical, dental, and vision insurance.
- 401(k) with generous employer match and Employee Stock Purchase Program (ESPP).
- Flexible PTO and paid parental leave.
- Catered daily lunch at office and data center locations.
- Comprehensive mental wellness benefits and family-forming support.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →