TL;DR
Technical Product Manager (AI): Defining how customers experience GPU clusters, focusing on their reliability, performance, and overall usability at scale. Focus on shaping user interaction with clusters, from dashboards and notifications to partitioning, node management, and training observability.
Location: Amsterdam, Netherlands; Berlin, Germany; France; Netherlands; Prague, Czech Republic; Remote - Europe
Company
hirify.global is leading a new era in cloud computing to serve the global AI economy.
What you will do
- Own key tracks in Cluster Experience: reliability, performance, and user experience for distributed ML workloads.
- Define product direction from problem discovery → design → delivery → adoption, working closely with engineering and research teams.
- Drive cross-functional execution across compute, networking, storage, observability, and platform teams.
- Perform deep customer research: interviews, analytics, and workload studies to identify bottlenecks across hardware, network, scheduler, and runtime.
- Translate state-of-the-art ML papers ideas into practical, scalable product features for large GPU clusters.
- Shape how users interact with clusters - from dashboards and notifications to partitioning, node management, and training observability.
Requirements
- 3–5+ years of experience in product management, ML infrastructure/MLOps, distributed systems engineering, or cloud architecture.
- Strong technical foundation in computer science, distributed systems, or ML infrastructure.
- Hands-on familiarity with ML training, ideally using orchestrators like Slurm, Kubernetes, Ray, or similar systems.
- Proven ability to ship technically complex features with multiple engineering teams.
- Excellent communicator capable of influencing engineering, research, and customer stakeholders.
- Experience with product analytics, data-driven prioritization, and experiment design.
Nice to have
- Experience working with GPU platforms, Infiniband/RDMA networking, or HPC systems.
- Understanding of modern ML frameworks (PyTorch, DeepSpeed, FSDP, NCCL, etc.).
- Knowledge of ML training efficiency: Goodput, MFU, scheduling, health checks.
- Exposure to LLM training, distributed data/ZeRO/FSDP strategies, or transformer inference.
- Customer-facing technical experience (supporting ML or infrastructure workloads).
Culture & Benefits
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within hirify.global.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →