Customer Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Customer Engineer (AI): Owning the post-sales technical relationship with strategic and enterprise customers, ensuring their AI models are healthy and their roadmap is heard with an accent on infrastructure debugging, AI/ML performance, and incident command. Focus on resolving issues across Kubernetes, GPUs, networking, and model serving, and translating pain points into product improvements.
Location: San Francisco, New York, or Remote within the United States
Salary: $165K – $330K
Company
powers mission-critical inference for dynamic AI companies by uniting applied AI research, flexible infrastructure, and seamless developer tooling.
What you will do
- Serve as the first responder to post-sales customer issues, triaging and resolving Tier 1 and Tier 2 issues.
- Diagnose runtime issues related to latency, memory behavior, GPU utilization, concurrency, and model lifecycle management.
- Lead incident response during outages and escalations, coordinating across Product, SRE, Sales, and Engineering.
- Own customer communication through resolution, including delivering root-cause analyses.
- Set up and maintain proactive monitoring and alerts for all customer production models.
- Translate user feedback into roadmap signals, documentation improvements, and product enhancements.
Requirements
- Deep Kubernetes troubleshooting expertise, including resource debugging, pod/runtime analysis, and log-based diagnostics with observability tooling (Grafana, Loki, Prometheus).
- Strong infrastructure debugging across container orchestration, networking, and service dependencies, with hands-on production cluster experience.
- Experience managing high-severity incidents with major customers, including SLAs, war rooms, and post-incident reviews.
- Proven project management skills with an ownership mindset.
- Ability to translate recurring technical pain points into roadmap-level insights and product improvements.
- Strong communication skills and executive presence during high-visibility situations.
- 3+ years of experience in a fast-paced, high-growth, or customer-facing engineering environment.
Nice to have
- Familiarity with high-performance AI model serving, including troubleshooting ML pipelines from preprocessing through inference.
- Experience with ticketing and incident-response platforms such as Pylon or Zendesk.
- Hands-on experience with Helm, Flux, CI/CD tooling, or scripting automations for deployment and operational workflows.
- Background in SRE, DevOps, or forward-deployed engineering roles at an infrastructure company.
Culture & Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents.
- Generous PTO policy including company wide Winter Break.
- Paid parental leave.
- Company-facilitated 401(k).
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →