📍 Location: Serbia, Armenia (We are ready to discuss other countries as well)
🏢 Remote work is possible
About the product
At FlameTree, we are building a platform for creating AI agents that help businesses scale customer support, lead follow-up, and sales across multiple communication channels — both inbound and outbound.
Our AI agents work with knowledge bases, communicate in real time, and drive conversions across messaging platforms. The platform supports 150+ languages and integrates with WhatsApp, email, and web applications, offering strong security and high scalability for business growth.
🎯Responsibilities:
• Design and develop the core agent layer responsible for orchestrating interactions with LLMs.
• Build and maintain complex conversational logic: state machines, agent workflows, and orchestration pipelines.
• Control and shape LLM behavior: prompt design, structured outputs, deterministic flows.
• Manage conversational context: memory, history, token limits, and degradation strategies.
• Ensure reliability and predictability on top of inherently non-deterministic models.
• Implement resilient integrations with LLM providers (timeouts, retries, fallbacks, multi-provider strategies).
• Optimize latency and cost (streaming, batching, caching, token efficiency).
• Debug complex production issues (inconsistent outputs, race conditions, state loss).
• Contribute to system architecture: clear boundaries between agents, backend, and real-time components.
• Build observability around LLM pipelines (prompt/response logging, tracing, quality metrics).
🎯Requirements:
• 5+ years of backend development experience with strong Python skills (async, architecture, performance).
• Proven production experience with LLMs (not side projects): understanding of limitations, cost, and behavior.
• Experience building agent-based systems or complex orchestration logic (state machines, pipelines).
• Ability to make LLM behavior predictable (structured outputs, schema validation, guardrails).
• Strong debugging skills in non-deterministic systems.
• Deep understanding of API integrations (timeouts, retries, idempotency, backpressure).
• Experience optimizing latency and throughput in production systems.
• Solid Docker experience and understanding of production environments.
• Ability to make architectural decisions independently and take ownership.
• Strong engineering mindset: writing maintainable, scalable, production-grade code.
🎯Nice to Have:
• Experience with multi-agent systems, tool/function calling
• Experience with local LLMs (Ollama, vLLM, GPU inference)
• Experience with real-time / voice systems
• LLM observability (prompt tracing, evals, quality metrics)
• Cost optimization at scale for LLM usage
🎯What Makes This Role Interesting:
• You will work on the core intelligence layer of the product — not just integrations.
• Real production challenges: high load, low latency, reliability requirements.
• Direct impact on system architecture and technical decisions.
• Fast execution cycle — minimal bureaucracy.
• Engineering-driven approach to LLMs (reliability, control, metrics — not just prompt tinkering).
• Strong engineering team focused on building real systems, not prototypes.
🎯Who This Role Is NOT For:
• Candidates without real production experience with LLMs
• Engineers relying only on frameworks without understanding underlying mechanics
• Developers without experience in high-load or latency-sensitive systems
• People focused on quick hacks rather than building reliable systems
📩 If you want to join a team where everything is fast, exciting, and truly about AI — drop a message: Показать контакты
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Текст вакансии взят без изменений
Источник - Telegram канал. Название доступно после авторизации