TL;DR
Senior AI Engineer (Node.js, LLM): Building and optimizing AI infrastructure for production-ready LLM experiences with an accent on advanced prompt systems, structured outputs, and complex LLM workflows. Focus on observability, debugging, evaluation, and data-driven decisions for model performance, reliability, and cost using tools like Langfuse and OpenRouter.
Location: Remote. Applicants from any country are welcome, provided they are located within approximately ± 4 hours of CET.
Company
hirify.global is a leading tech company creating innovative consumer products across health, education, and entertainment industries.
What you will do
- Design complex, dynamic prompt templates with conditional logic and efficiently reuse information and context within prompts to maximize generation quality and reasoning.
- Implement various response schemes (JSON mode, function calling, Zod/JSON schemas) to ensure AI outputs are predictable and ready for seamless integration into application logic.
- Build robust evaluation pipelines and use Langfuse to collect feedback and score the quality of responses in real time.
- Perform deep debugging of complex LLM chains using Langfuse traces to identify bottlenecks and optimize for cost, latency, and context window usage.
- Run systematic experiments across different models via OpenRouter (e.g., comparing Claude 3.5 Sonnet vs. GPT-4o) and analyze results based on quantitative metrics.
- Make deployment decisions for new prompts or models strictly based on quantitative benchmarks and trace data, rather than intuition.
Requirements
- Deep knowledge of Node.js & Next.js stack to build reliable services and handle complex LLM-generated data.
- Proven experience in building prompts where content is highly dependent on input variables and context injection.
- Experience working with unified APIs like OpenRouter, managing rate limits, and selecting the most cost-effective models for specific tasks.
- Understanding of LLM observability principles — setting up tracing, creating test datasets, and integrating scoring systems using Langfuse (or similar).
- Experience with evaluation frameworks like RAGAS or building custom “LLM-as-a-judge” systems.
- Ability to transform raw generation logs into actionable business metrics and technical insights with an analytical mindset.
Nice to have
- Practical experience in fine-tuning models for specific domain tasks or JSON compliance.
- Understanding how to build and optimize Retrieval-Augmented Generation (RAG) systems, including indexing, retrieval, and re-ranking.
- Basic knowledge of Python for working with data science scripts or AI evaluation libraries.
Culture & Benefits
- Embrace the freedom of a remote work environment, promoting a healthy work-life balance.
- Enjoy unlimited paid time off to recharge and prioritize your well-being.
- Celebrate and relax on national holidays with paid time off.
- Experience seamless productivity with top-notch Apple MacBooks provided to all employees who need them.
- Unlock the benefits of flexibility, autonomy, and entrepreneurial opportunities with a flexible Independent Contractor Agreement.
Hiring process
- Recruiter Screening (40 minutes).
- Technical Interview (60 minutes).
- Final Interview (30 minutes).
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →