ML Platform Engineering Team Lead (Sovereign AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
ML Platform Engineering Team Lead (Sovereign AI): Set technical direction for ML platform including training pipelines, model serving, feature stores, experiment tracking, and compute orchestration with an accent on large language models in air-gapped and on-premise environments. Focus on leading ML engineers, establishing standards for reproducible experiments, automated evaluations, CI/CD for models, operational excellence, and cross-team collaboration to support full model lifecycle from training to production inference.
Location: Tel Aviv
Company
combines AI and human expertise to protect nations and critical infrastructure with a sovereign AI cybersecurity platform operating in on-premise, private cloud, and air-gapped environments.
What you will do
- Set technical direction for ML platform through RFCs, prototypes, design reviews, and build-vs-buy decisions.
- Lead and grow a team of ML Engineers via hiring, mentoring, pair programming, and code/design reviews.
- Contribute to critical systems, debug production issues, and maintain codebase context for architecture decisions.
- Own operational excellence for model serving including SLAs, capacity planning, and compute costs.
- Establish ML engineering standards for reproducible experiments, automated evals, model packaging, CI/CD, and observability.
- Support full model lifecycle and collaborate with Data Platform, AI, Data Science, and Product teams.
- Measure and improve developer experience metrics like deploy friction and CI turnaround.
Requirements
- 6+ years in software/ML/platform engineering with hands-on ML infrastructure at scale.
- 2+ years leading engineering teams including hiring, mentoring, and design reviews.
- Strong Python, distributed systems design, testing, secure coding, API design, CI/CD, and production ownership.
- Experience with model serving frameworks (Triton, TorchServe, vLLM, Ray Serve), deployment pipelines, and inference optimization.
- Distributed training pipelines (PyTorch, JAX), experiment orchestration, and reproducibility.
- ML lifecycle tooling (feature stores, MLflow, Weights & Biases), data pipelines (Spark, Airflow), and streaming ingestion.
- Comfortable with AI coding tools like Cursor, Claude Code, or Copilot.
Nice to have
- Experience in constrained environments like on-premise, private cloud, or air-gapped deployments.
- Hands-on with simulation environments, synthetic data, or reinforcement learning.
- Platform/infra skills: Kubernetes, AWS, Terraform, CI/CD, observability, incident response.
- Hands-on data science or applied ML experience.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →