Software Engineer, Inference - Performance Optimization (AI)
ΠΡΡΡ & Π‘ΠΎΠΏΡΠΎΠ²ΠΎΠ΄
ΠΠ»Ρ ΠΌΡΡΡΠ° Ρ ΡΡΠΎΠΉ Π²Π°ΠΊΠ°Π½ΡΠΈΠ΅ΠΉ Π½ΡΠΆΠ΅Π½ Plus
ΠΠΏΠΈΡΠ°Π½ΠΈΠ΅ Π²Π°ΠΊΠ°Π½ΡΠΈΠΈ
TL;DR
Software Engineer, Inference - Performance Optimization (AI): Building and optimizing the inference stack across application, model, and fleet layers with an accent on reducing latency and cost-to-serve. Focus on developing high-fidelity performance models, identifying system bottlenecks, and optimizing hardware efficiency.
Location: San Francisco, USA
Salary: $295K β $555K + Equity
Company
is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity.
What you will do
- Build and refine performance models that translate microbenchmark results into cost-to-serve estimates.
- Analyze end-to-end inference workloads across applications, models, and fleet infrastructure.
- Enhance tooling to identify bottlenecks across layers for latency and throughput.
- Partner with cross-functional teams to turn performance insights into concrete improvements.
- Project how future architectural changes affect inference performance and capacity.
Requirements
- Deep expertise in performance profiling, benchmarking, analysis, and optimization.
- Strong ability to reason from first principles about distributed systems and model inference.
- Experience working across abstraction layers, from application behavior to kernels, accelerators, and networking.
- Knowledge of fleet scheduling and hardware efficiency.
- Must be based in or authorized to work in the US (San Francisco).
Culture & Benefits
- Opportunity to work at the forefront of AI research and deployment.
- Competitive compensation package including base salary and equity.
- Collaborative environment working with world-class engineering and research teams.
- Commitment to safety and human-centric AI development.
ΠΡΠ΄ΡΡΠ΅ ΠΎΡΡΠΎΡΠΎΠΆΠ½Ρ: Π΅ΡΠ»ΠΈ ΡΠ°Π±ΠΎΡΠΎΠ΄Π°ΡΠ΅Π»Ρ ΠΏΡΠΎΡΠΈΡ Π²ΠΎΠΉΡΠΈ Π² ΠΈΡ ΡΠΈΡΡΠ΅ΠΌΡ, ΠΈΡΠΏΠΎΠ»ΡΠ·ΡΡ iCloud/Google, ΠΏΡΠΈΡΠ»Π°ΡΡ ΠΊΠΎΠ΄/ΠΏΠ°ΡΠΎΠ»Ρ, Π·Π°ΠΏΡΡΡΠΈΡΡ ΠΊΠΎΠ΄/ΠΠ, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡΠ΅ ΡΡΠΎΠ³ΠΎ - ΡΡΠΎ ΠΌΠΎΡΠ΅Π½Π½ΠΈΠΊΠΈ. ΠΠ±ΡΠ·Π°ΡΠ΅Π»ΡΠ½ΠΎ ΠΆΠΌΠΈΡΠ΅ "ΠΠΎΠΆΠ°Π»ΠΎΠ²Π°ΡΡΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡΠΈΡΠ΅ Π² ΠΏΠΎΠ΄Π΄Π΅ΡΠΆΠΊΡ. ΠΠΎΠ΄ΡΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β