Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Staff Senior Software Engineer (AI): Designing and building deployment infrastructure to move inference code from merge to production with an accent on capacity-aware scheduling and resource-constrained optimization. Focus on orchestrating validation, driving down cycle time, and optimizing rollout strategies across GPU, TPU, and Trainium fleets.
Location: Hybrid (San Francisco, New York City, or Seattle); must be in office at least 25% of the time
Salary: $320,000 – $485,000 USD
Company
Anthropic is a public benefit corporation dedicated to creating reliable, interpretable, and steerable AI systems that are safe and beneficial for society.
What you will do
- Own deployment orchestration to move validated inference builds into production across GPU, TPU, and Trainium fleets.
- Improve capacity-aware deployment scheduling to maximize throughput against constrained accelerator budgets.
- Develop deployment observability tools and dashboards to track code status and validation results.
- Architect pipelines to minimize serial dependencies and maximize parallelism to reduce merge-to-production cycle time.
- Optimize fleet rollout strategies for large-scale deployments across thousands of accelerator chips.
- Evolve self-service model onboarding to enable continuous deployment without manual intervention.
Requirements
- Experience designing systems for complex state machines and multi-stage pipelines.
- Proficiency with Kubernetes-based deployments, container orchestration, and rolling update mechanics.
- Experience building delivery infrastructure where resource constraints (capacity, bandwidth) shape the design.
- Proven track record of building automation that measurably improves deployment velocity and reliability.
- Comfort working across the stack, from backend services and databases to CLI tools and web UIs.
- Bachelor's degree or equivalent combination of education and professional experience.
Nice to have
- 5+ years of experience building large-scale deployment or release infrastructure.
- Production experience with Python and/or Rust.
- Experience with ML inference or training infrastructure deployment across multiple accelerator types.
- Background in capacity planning, bin-packing, or resource-constrained scheduling.
- Experience with progressive delivery, such as canary/soak testing and blue-green deployments.
Culture & Benefits
- Competitive compensation and benefits with optional equity donation matching.
- Generous vacation and parental leave.
- Flexible working hours and high-quality collaborative office spaces.
- Visa sponsorship available for eligible candidates.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →