Senior Technical Product Manager (Fleet Operations, GenAI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Technical Product Manager, Fleet Operations (GenAI Cloud): Own the product strategy for day 0–2+ operational software running global GPU fleet — provisioning, bringup, monitoring, incident response, repair, firmware, and decommissioning with an accent on availability, utilisation, and time-to-recover. Focus on translating operational pain into scalable tooling and automation, defining key metrics, and driving multi-quarter cross-functional initiatives across live clusters.
Location: London
Company
is building a vertically integrated GenAI cloud platform owning data centres, software, and applications powered by sustainable technology solutions.
What you will do
- Own strategy and roadmap for a major Fleet Operations product area like provisioning, fleet health, incident workflows, firmware, or inventory management.
- Lead cross-functional initiatives from problem framing to rollout across GPU clusters with Fleet Software, SRE, data centre ops, and Support.
- Shadow on-call, support, and repair workflows to turn toil into product capabilities, automation, and platform features.
- Define and drive against fleet metrics: availability, utilisation, MTTR, time-to-bring-up, failure rates, ticket deflection.
- Partner with engineering on architecture for bare metal, orchestration, observability, and control planes; drive incident reviews into product fixes.
- Mentor junior PMs and elevate PRDs, reviews, and decisions team-wide; represent area in planning and leadership.
Requirements
- 5–8 years product management in software/technology, owning infrastructure/platform/operations products.
- Strong technical fluency in large-scale systems: lead on architecture, trade-offs in provisioning, orchestration, observability, control planes.
- Experience building for operators (SREs, NOC/support, data centre techs) with deep workflow understanding.
- Proven ability to ship products from ambiguous ops problems improving reliability, efficiency, recovery.
- Mentoring/leading peers experience; excellent communication for engineers, operators, executives.
Nice to have
- CS/engineering degree or prior engineer/SRE experience.
- Background in cloud infra, bare-metal provisioning, fleet/hardware lifecycle, observability, incident tools.
- Experience with OpenStack Ironic/MAAS/Tinkerbell, NetBox/Device42, Jira Service Management/ServiceNow, Grafana/Prometheus/Datadog, GPU/data centre ops.
- High-growth/early-stage environment where product built alongside fleet.
Culture & Benefits
- Relentless innovation, ownership, accountability; build trust via openness and transparency.
- Collaboration with swift, respectful adaptability and resilience.
- Inclusive, diverse, equitable workplace encouraging all backgrounds.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →