Technical Lead Manager, ML Infrastructure (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Technical Lead Manager, ML Infrastructure (AI): Leading the development and scaling of core infrastructure powering machine learning and self-hosted LLM applications with an accent on low-latency serving, streaming feature ingestion, and distributed training. Focus on building high-throughput GPU inference systems and improving developer ergonomics for ML scientists.
Location: Must be based in the US and live within commuting distance of hubs in New York, Seattle, Los Angeles, or San Francisco
Salary: $255K – $345K + Equity
Company
is the largest livestream shopping platform in North America and Europe, enabling users to buy, sell, and discover items across hundreds of categories.
What you will do
- Own the infrastructure powering AI and ML models for growth, recommendations, trust and safety, and fraud.
- Design and scale low-latency, high-throughput inference infrastructure capable of serving large models.
- Evolve real-time feature pipelines to ensure single-second feedback from behavioral signals.
- Lead the development of distributed training and inference pipelines leveraging GPUs and model/data parallelism.
- Build abstractions, APIs, and developer tools to simplify the iteration of near-realtime features for scientists.
- Optimize system performance through resource utilization management and intelligent feature caching.
Requirements
- Must live within commuting distance of New York, Seattle, Los Angeles, or San Francisco hubs.
- 1+ years of TLM experience developing production machine learning systems at consumer-scale loads.
- 5+ years of hands-on software engineering experience building production systems for consumer-scale loads.
- Professional experience developing software in Python.
- Experience with operational, search, and key-value databases such as PostgreSQL, DynamoDB, Elasticsearch, and Redis.
- Experience with ML-specific tools and frameworks like MLFlow, LitServe, TorchServe, or Triton.
Nice to have
- Familiarity with AWS services including Sagemaker, Lambda, Kinesis, S3, EC2, and EKS/ECS.
- Experience with Apache Kafka and Flink.
- Proficiency with monitoring and logging tools such as DataDog and Grafana.
Culture & Benefits
- Comprehensive health insurance (Medical, Dental, Vision) and 401k with employer match up to 4%.
- Dedicated work-from-home support including setup allowance and monthly cell phone/internet stipend.
- Wellness and childcare allowances, plus lifetime benefits for family planning.
- 16 weeks of paid parental leave with a gradual return-to-work period.
- Monthly budget for dogfooding the app to develop a deep understanding of the product.
- Generous holiday and time-off policy.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →