Performance Engineer (AI Inference)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Performance Engineer (AI Inference): Developing and optimizing high-throughput inference systems for Claude with an accent on throughput, latency, reliability, and correctness. Focus on cross-layer performance investigations, building observability tools, and bridging the gap between actual fleet performance and theoretical rooflines.
Location: Hybrid; must be based in or attend one of the offices (San Francisco, New York City, or Seattle) at least 25% of the time.
Salary: $350,000 - $850,000 USD
Company
is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for the benefit of society.
What you will do
- Conduct cross-layer performance investigations to identify root causes for gaps in throughput, latency, and reliability.
- Own and improve the correctness evaluation pipeline to validate model output quality across hardware platforms and serving configurations.
- Develop observability dashboards and modeling tools to make system interactions legible across the stack.
- Partner with kernel, serving, routing, and capacity teams to implement high-impact optimizations.
- Prioritize and stack-rank a large surface area of optimization opportunities based on impact and effort.
Requirements
- Hands-on experience in performance engineering, including profiling, roofline analysis, and root-cause investigation in production systems.
- Proficiency in Python and the ability to instrument and contribute to large production codebases.
- Strong data analysis skills using SQL, pandas, or similar tools.
- Ability to communicate quantitative results clearly to influence priorities across teams.
- Genuine interest in correctness as an engineering discipline, including numerics and regression detection.
- Must be based in or able to attend the US offices (SF, NYC, or Seattle) at least 25% of the time.
Nice to have
- Experience with ML systems, specifically training or inference infrastructure and LLM serving stacks.
- Familiarity with GPU/TPU/accelerator performance concepts such as memory bandwidth and quantization.
- Reliability engineering experience for high-throughput services, including autoscaling and load balancing.
- Experience building observability or telemetry for distributed systems.
- Experience with model evaluation or numerical regression-detection pipelines.
Culture & Benefits
- Collaborative research environment based on the "big science" approach.
- Competitive compensation and optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours and high-quality collaborative office spaces.
- Visa sponsorship available for qualified candidates.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β