Principal Machine Learning Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Machine Learning Engineer (AI): Architecting and evolving critical ML systems including training, inference, and evaluation infrastructure with an accent on large-scale model performance and GPU efficiency. Focus on solving complex architectural challenges, building reproducible pipelines, and ensuring system reliability at scale.
Location: Must be based in Palo Alto, California (Hybrid role)
Company
A builder of proactive AI systems that understand context, plan actions, and execute work over time.
What you will do
- Architect and build large-scale ML systems spanning data, training, evaluation, and inference.
- Design reproducible, high-performance training pipelines on GPU infrastructure.
- Architect inference systems that balance latency, throughput, cost, and reliability.
- Implement evaluation pipelines covering model robustness, safety, and bias.
- Own production deployment including GPU optimization and memory efficiency.
- Collaborate with product teams to integrate ML systems into user-facing applications.
Requirements
- Strong background in deep learning and transformer-based architectures.
- Hands-on experience training, fine-tuning, or deploying large-scale ML models in production.
- Proficiency in at least one modern framework like PyTorch or JAX.
- Experience with distributed training frameworks such as DeepSpeed, FSDP, or Ray.
- Solid software engineering fundamentals for building production-grade systems.
- Knowledge of GPU optimization, memory efficiency, and mixed precision.
Nice to have
- Experience with LLM inference frameworks like vLLM or TensorRT-LLM.
- Background in scientific computing, compilers, or GPU kernels.
- Experience with RLHF pipelines (PPO, DPO).
- Experience training or deploying multimodal models.
Culture & Benefits
- High talent density, hands-on environment with a small, world-class team.
- Fast-paced, collaborative decision-making culture.
- Opportunity to work on zero-to-one AI systems with global scale and impact.
Hiring process
- Evaluation by technical team members.
- 3 to 4 interviews conducted via virtual meetings or onsite.
- Transparent and efficient process with prompt decision-making.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →