Вакансия из Telegram канала - Название доступно после авторизации
Пожаловаться
90
Крутая вакансия
развернуть
Узкоспециализированная LLMOps роль с передовым стеком и очень четким разделением обязанностей. Идеально для тех, кому ближе инфраструктурная часть AI, а не research.
REMOTE, b2b contract with private legal entity of the candidate registered outside of Russia, Belarus and Ukraine (no business incubators)
Location of the candidate – ANY
B2 English and Native Russian
London product company
We are seeking a systems-oriented engineer to build and operate a
high-performance local LLM inference platform using a GPU-based cluster (gaming desktops). This role is dedicated to the serving, orchestration, reliability, and automated testing of open-source models (Gemma, Qwen, Whisper, etc.).
This is a systems infrastructure position focused on the operability of AI hardware and software stacks—it is not a model training or data science research role.
Key Responsibilities -
Service Operations: Run and scale LLM services using Ray/Ray
Serve, Docker, and Linux.
Inference Management: Deploy models via vLLM/Hugging Face and
expose them through high-performance, OpenAI-compatible APIs.
Performance Tuning: Optimize GPU utilization, request batching,
latency, and throughput to maximize hardware efficiency.
Reliability Engineering: Maintain system stability through monitoring, robust failure handling, and automated recovery.
Lifecycle Management: Own the full model deployment cycle:
versioning, upgrades, benchmarking, and rollbacks.
Agent Support: Provide infrastructure support for internal teams building agents, automation, and AI workflows.
Test Automation: Partner with the QA team to implement end-to-end
automated testing for LLM workflows.
Prompt Engineering: Verify stability, availability, speed and
responsiveness of different models.
Evaluation: Support "LLM-as-a-judge" frameworks and automated
evaluation pipelines to ensure output quality.
Agent Stability: Contribute to agent behavior stability via prompt
engineering and constraint enforcement.
Benchmarking: Maintain prompt suites and regression benchmarks
to catch performance dips early.
Framework Integration: Support integration with agent frameworks
like LangGraph and n8n.
Feedback Loops: Enable telemetry-driven feedback to optimize
prompts, routing, and model selection.
Note: This role is not focused on model training, fine-tuning, or
data science research. We are building the production-grade
plumbing required to make local AI systems fast, stable, and
scalable.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Текст вакансии взят без изменений
Источник - Telegram канал. Название доступно после авторизации