AI Platform Engineer (Training and Inference)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
AI Platform Engineer (Training and Inference): Building and operating the compute layer for training, evaluating, and serving AI models with an accent on distributed training, LLM inference mesh, and model promotion lifecycles. Focus on scaling Ray-based infrastructure, optimizing inference performance, and automating the full model deployment pipeline.
Location: Must be based in or able to work from San Francisco (Hybrid)
Company
is a leader in identity security, providing an AI-powered platform that manages and governs access to applications, data, and business processes for Fortune 500 companies and government institutions.
What you will do
- Manage the Ray ecosystem end-to-end, including KubeRay on GKE and distributed object stores.
- Operate distributed training pipelines using Ray Train, TorchTrainer, and multi-node H100 clusters.
- Build and scale the LLM inference mesh using vLLM, SGLang, and NVIDIA Triton.
- Design and operate the model routing layer with cost-aware fallback between SLMs and LLMs.
- Implement the full model promotion lifecycle, including shadow mode, A/B testing, and canary rollouts.
- Integrate RAG retrieval into the inference mesh for context-aware LLM responses.
Requirements
- Experience in ML engineering with a focus on ML platform or MLOps roles.
- Production-level expertise with the Ray ecosystem (Ray Train, Serve, Core, Data).
- Hands-on experience with LLM serving engines like vLLM, SGLang, or NVIDIA Triton.
- Strong proficiency in Python, PyTorch, and ML orchestration tools like Flyte.
- Knowledge of distributed training techniques including DDP, FSDP, and NCCL collectives.
- Understanding of model lifecycle operations, including registry, shadow/canary patterns, and automated rollbacks.
Nice to have
- Experience with model quantization techniques such as INT8/INT4/FP8 (GPTQ, AWQ, bitsandbytes).
Culture & Benefits
- Competitive total rewards package.
- Opportunities for career growth and advancement.
- Eligibility for discretionary bonus plans based on individual and organizational performance.
- Work in a high-impact environment focused on cutting-edge AI identity security.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →