Назад
Company hidden
4 дня назад

Performance Engineer (AI Inference)

350Β 000 - 850Β 000$
Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
hybrid
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Английский
b2
Π‘Ρ‚Ρ€Π°Π½Π°
US
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Performance Engineer (AI Inference): Developing and optimizing high-throughput inference systems for Claude with an accent on throughput, latency, reliability, and correctness. Focus on cross-layer performance investigations, building observability tools, and bridging the gap between actual fleet performance and theoretical rooflines.

Location: Hybrid; must be based in or attend one of the offices (San Francisco, New York City, or Seattle) at least 25% of the time.

Salary: $350,000 - $850,000 USD

Company

hirify.global is a public benefit corporation focused on creating reliable, interpretable, and steerable AI systems for the benefit of society.

What you will do

  • Conduct cross-layer performance investigations to identify root causes for gaps in throughput, latency, and reliability.
  • Own and improve the correctness evaluation pipeline to validate model output quality across hardware platforms and serving configurations.
  • Develop observability dashboards and modeling tools to make system interactions legible across the stack.
  • Partner with kernel, serving, routing, and capacity teams to implement high-impact optimizations.
  • Prioritize and stack-rank a large surface area of optimization opportunities based on impact and effort.

Requirements

  • Hands-on experience in performance engineering, including profiling, roofline analysis, and root-cause investigation in production systems.
  • Proficiency in Python and the ability to instrument and contribute to large production codebases.
  • Strong data analysis skills using SQL, pandas, or similar tools.
  • Ability to communicate quantitative results clearly to influence priorities across teams.
  • Genuine interest in correctness as an engineering discipline, including numerics and regression detection.
  • Must be based in or able to attend the US offices (SF, NYC, or Seattle) at least 25% of the time.

Nice to have

  • Experience with ML systems, specifically training or inference infrastructure and LLM serving stacks.
  • Familiarity with GPU/TPU/accelerator performance concepts such as memory bandwidth and quantization.
  • Reliability engineering experience for high-throughput services, including autoscaling and load balancing.
  • Experience building observability or telemetry for distributed systems.
  • Experience with model evaluation or numerical regression-detection pipelines.

Culture & Benefits

  • Collaborative research environment based on the "big science" approach.
  • Competitive compensation and optional equity donation matching.
  • Generous vacation and parental leave.
  • Flexible working hours and high-quality collaborative office spaces.
  • Visa sponsorship available for qualified candidates.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли Ρ€Π°Π±ΠΎΡ‚ΠΎΠ΄Π°Ρ‚Π΅Π»ΡŒ просит Π²ΠΎΠΉΡ‚ΠΈ Π² ΠΈΡ… систСму, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’