TL;DR
Senior Software Engineer I (AI): Leading design, architecture, and reliability improvements for a Kubernetes-native AI inference platform with an accent on optimizing latency, throughput, and ensuring P99 SLAs. Focus on implementing advanced optimizations, strengthening incident posture, and mentoring engineers in distributed systems and cloud services.
Location: Hybrid in Sunnyvale, CA or Bellevue, WA. Remote work may be considered for candidates more than 30 miles from an office. New hires will be invited to attend onboarding at one of our hubs within their first month, and teams gather quarterly. Must be a U.S. person (citizen, lawful permanent resident, refugee, or asylee) due to export controlled information access requirements.
Salary: $139,000–$204,000
Company
hirify.global is The Essential Cloud for AI™ that delivers a platform of technology, tools, and teams enabling innovators to build and scale AI with confidence.
What you will do
- Lead design reviews and drive architecture within the team, decomposing multi-service work into clear milestones.
- Define and own SLIs/SLOs, ensuring post-incident actions land and reliability improves release-over-release.
- Implement advanced optimizations such as micro-batch schedulers, speculative decoding, and KV-cache reuse, and quantify their impact.
- Strengthen incident posture by focusing on capacity planning, autoscaling policy, graceful degradation, and rollback/traffic-shift strategies.
- Mentor IC1/IC2 engineers and review cross-team designs to elevate coding/testing standards.
- For IC4: Own an area spanning multiple services and teams, such as request routing & adaptive scheduling, cost-per-token analytics, or GPU resource isolation.
Requirements
- 3–8 years industry experience building distributed systems or cloud services.
- Strong coding skills in Python or Go, with deep familiarity with networked systems and performance.
- Hands-on experience with Kubernetes at production scale, CI/CD, and observability stacks (Prometheus, Grafana, OpenTelemetry).
- Practical knowledge of inference internals, including batching, caching, mixed precision (BF16/FP8), and streaming token delivery.
- Proven track record improving tail latency (P95/P99) and service reliability through metrics-driven work.
- Must be a U.S. person (U.S. citizen/national, lawful permanent resident, refugee, or asylee) to access export-controlled information.
Nice to have
- Contributions to inference frameworks (vLLM, Triton, TensorRT-LLM, Ray Serve, TorchServe).
- Experience with CUDA kernels, NCCL/SHARP, RDMA/NUMA, or GPU interconnect topologies.
- Leading multi-team initiatives or partnering with customers on mission-critical launches.
Culture & Benefits
- Hybrid work environment with potential remote consideration for candidates over 30 miles from an office.
- Flexible PTO and a casual work environment.
- 401(k) with a generous employer match and Employee Stock Purchase Program.
- Medical, dental, and vision insurance 100% paid for by hirify.global.
- Company-paid life insurance, short and long-term disability insurance, Flexible Spending Account, and Health Savings Account.
- Tuition reimbursement, mental wellness benefits through Spring Health, family-forming support by Carrot, and paid parental leave.
- Flexible, full-service childcare support with Kinside.
- Catered lunch each day in office and data center locations.
- An entrepreneurial outlook, independent thinking, and an environment that encourages collaboration and innovation.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →