Performance Engineer (AI Inference)

350 000 - 850 000$

Формат работы

hybrid

Тип работы

fulltime

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Performance Engineer (AI Inference): Developing and optimizing high-throughput inference systems for Claude with an accent on throughput, latency, reliability, and correctness. Focus on cross-layer performance investigations, building observability tools, and bridging the gap between actual fleet performance and theoretical rooflines.

Location: Hybrid; must be based in or attend one of the offices (San Francisco, New York City, or Seattle) at least 25% of the time.

Salary: $350,000 - $850,000 USD

Company

hirify.global is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for the benefit of society.

What you will do

Conduct cross-layer performance investigations to identify root causes for gaps in throughput, latency, and reliability.
Own and improve the correctness evaluation pipeline to validate model output quality across hardware platforms and serving configurations.
Develop observability dashboards and modeling tools to make system interactions legible across the stack.
Partner with kernel, serving, routing, and capacity teams to implement high-impact optimizations.
Prioritize and stack-rank a large surface area of optimization opportunities based on impact and effort.

Requirements

Hands-on experience in performance engineering, including profiling, roofline analysis, and root-cause investigation in production systems.
Proficiency in Python and the ability to instrument and contribute to large production codebases.
Strong data analysis skills using SQL, pandas, or similar tools.
Ability to communicate quantitative results clearly to influence priorities across teams.
Genuine interest in correctness as an engineering discipline, including numerics and regression detection.
Must be based in or able to attend the US offices (SF, NYC, or Seattle) at least 25% of the time.

Nice to have

Experience with ML systems, specifically training or inference infrastructure and LLM serving stacks.
Familiarity with GPU/TPU/accelerator performance concepts such as memory bandwidth and quantization.
Reliability engineering experience for high-throughput services, including autoscaling and load balancing.
Experience building observability or telemetry for distributed systems.
Experience with model evaluation or numerical regression-detection pipelines.

Culture & Benefits

Collaborative research environment based on the "big science" approach.
Competitive compensation and optional equity donation matching.
Generous vacation and parental leave.
Flexible working hours and high-quality collaborative office spaces.
Visa sponsorship available for qualified candidates.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →