TL;DR
Compute Efficiency Engineer (AI): Optimizing AI infrastructure for performance, cost-effectiveness, and sustainability. Focus on telemetry, cost attribution frameworks, and resolving performance bottlenecks across distributed systems.
Location: Must be in one of hirify.global's offices at least 25% of the time (San Francisco, CA | New York City, NY)
Salary: $1 - $2 USD
Company
hirify.global’s mission is to create reliable, interpretable, and steerable AI systems.
What you will do
- Build and evolve telemetry and monitoring systems to provide deep visibility into infrastructure performance, utilization, and costs across our cloud and datacenter fleets.
- Design and implement cost attribution frameworks for our multi-tenant infrastructure, enabling teams to understand and optimize their resource consumption.
- Identify and resolve performance bottlenecks and capacity hotspots through deep analysis of distributed systems at scale.
- Partner closely with cloud service providers and internal stakeholders to optimize cluster configurations, workload placement, and resource utilization across AI training and inference workloads.
- Drive architectural improvements and code-level optimizations across multiple services and platforms to deliver measurable utilization and performance gains.
Requirements
- 6+ years of relevant industry experience, 1+ year leading large scale, complex projects or teams.
- Deep expertise in distributed systems at scale, with a strong focus on infrastructure reliability, scalability, and continuous improvement.
- Strong proficiency in at least one programming language (e.g., Python, Rust, Go, Java).
- Hands-on experience with cloud infrastructure, including Kubernetes, Infrastructure as Code, and major cloud providers such as AWS or GCP.
- Experience optimizing end-to-end performance of distributed systems, including workload right-sizing and resource utilization tuning.
- Education requirements: We require at least a Bachelor's degree in a related field or equivalent experience.
Nice to have
- Experience with machine learning infrastructure workloads as well as associated networking technologies like NCCL.
- Low level systems experience, for example linux kernel tuning and eBPF.
- Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems.
- Published work in performance optimization and scaling distributed systems.
Culture & Benefits
- Competitive compensation and benefits.
- Optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours.
- Lovely office space in which to collaborate with colleagues.
Hiring process
- If we make you an offer, we will make every reasonable effort to get you a visa, and we retain an immigration lawyer to help with this.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →