TL;DR
Product Manager (AI): Building and optimizing compute platform systems for AI infrastructure with an accent on scheduling, orchestration, and capacity management. Focus on defining product strategy, managing trade-offs between utilization and cost, and enabling researcher velocity for diverse AI workloads.
Location: Hybrid (San Francisco, New York City, Seattle), requiring office presence at least 25% of the time. Visa sponsorship is available.
Salary: $305,000 - $385,000 USD annually.
Company
hirify.global is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for societal benefit.
What you will do
- Partner with cross-functional teams to build scheduling, orchestration, and capacity management systems for GPU and accelerator clusters.
- Drive the evolution of the compute platform to support diverse AI workloads, from large-scale training to real-time inference.
- Define and own the product strategy and roadmap for job scheduling primitives, capacity allocation, and quota management.
- Lead the trade-off framework for utilization efficiency, job latency, cost, and reliability.
- Collaborate on capacity planning models, demand forecasting, and cost-to-serve analytics.
- Build and champion observability tools for real-time visibility into cluster health and resource usage.
Requirements
- 7+ years of product management experience with compute infrastructure, distributed systems, or scheduling/orchestration platforms.
- Experience taking technical infrastructure products from infancy to scale.
- Ability to internalize complex technical systems (job schedulers, cluster managers, resource orchestrators) and translate that understanding into a comprehensive product vision.
- Proficiency in discussing scheduling algorithms with engineers, capacity economics with finance, and infrastructure strategy with leadership.
- Bachelor's degree in a related field or equivalent experience.
- Must be able to work from San Francisco, New York City, or Seattle offices at least 25% of the time.
Nice to have
- Experience building or scaling job scheduling, resource orchestration, or workload management systems (e.g., Kubernetes, Slurm, Borg, YARN).
- Deep familiarity with GPU/accelerator scheduling challenges, including gang-scheduling, topology-aware placement, and preemption.
- Experience defining and enforcing SLAs and resource guarantees for compute workloads.
- Capacity planning experience across cloud and on-premises infrastructure.
- Experience with observability and efficiency tooling for distributed infrastructure.
Culture & Benefits
- Focus on highest-impact AI research, working as a single cohesive team on a few large-scale research efforts.
- Competitive compensation and benefits, with optional equity donation matching.
- Generous vacation and parental leave, and flexible working hours.
- Collaborative group with frequent research discussions.
- Lovely office space in San Francisco, New York City, and Seattle.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →