Senior AI Platform Engineer (Sovereign AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior AI Platform Engineer (Sovereign AI): Building the agentic AI platform that turns LLMs into reliable production-grade capabilities with an accent on agent orchestration frameworks, LLM gateways, evaluation pipelines, tool-calling infrastructure, and retrieval systems. Focus on designing multi-agent workflows, productionizing backend services integrating LLMs, and optimizing performance and cost across cloud and on-prem environments.
Tel Aviv
Company
combines AI and human expertise to create cybersecurity products protecting nations and critical infrastructure, with a sovereign AI platform operating in on-premise, private cloud, and air-gapped environments.
What you will do
- Design and build agentic systems including single and multi-agent workflows with planning, memory, context engineering, and tool use for internal automation and product capabilities.
- Build and operate AI platform layer: LLM gateways, prompt management, structured output handling, tool-calling, and cost/latency optimization on Kubernetes.
- Own agent framework layer: orchestration primitives, execution environments, state management, and sandboxed tool execution.
- Develop evaluation infrastructure for agent behavior including automated evals for quality, safety, latency, cost, and human-in-the-loop oversight.
- Productionize backend services (APIs, gRPC, async workers) with error handling, retries, circuit breakers, and high-availability.
- Build and own RAG pipelines and retrieval systems: indexing, embedding, vector database management, and relevance tuning.
- Optimize AI stack performance and cost: model routing, caching, batching, inference management.
- Ship shared tooling, libraries, SDKs, agent templates, and documentation while collaborating with ML, Data Platform, DevOps teams.
Requirements
- 5+ years in backend or distributed systems engineering, 2+ years on production AI/ML/LLM systems.
- Strong Python, Go, or Java; system architecture, API design, testing, secure coding.
- Experience designing agent orchestration, tool-use systems, autonomous workflows (e.g., LangGraph or equivalent).
- Building production APIs/services (FastAPI); async programming, high-availability, reliability patterns.
- LLM integration: SDKs/APIs, context engineering, structured outputs, tool calling, model routing.
- RAG/retrieval: embedding pipelines, vector DBs (Milvus, Qdrant, Pinecone), chunking, relevance tuning.
- LLM/agent evals, observability for non-deterministic systems.
Nice to have
- Kubernetes, AWS, Terraform IaC, CI/CD, container orchestration.
- MCP or similar tool-use protocols.
- Hands-on ML: model training, fine-tuning, ML pipelines.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →