Назад
15 часов назад

Senior LLM Infrastructure & Reliability Engineer (LLM)

8 000$
Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK
vacancy_detail.hirify_telegram_tooltipВакансия из Telegram канала -

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

#devops #LLM #remote

Senior LLM Infrastructure & Reliability Engineer (LLM, Python, Kubernetes, Linux)

Salary is up to 8000 usd gross

REMOTE, b2b contract with private legal entity of the candidate registered outside of Russia, Belarus and Ukraine (no business incubators)

Location of the candidate – ANY

B2 English and Native Russian

London product company

We are seeking a systems-oriented engineer to build and operate a
high-performance local LLM inference platform using a GPU-based cluster (gaming desktops). This role is dedicated to the serving, orchestration, reliability, and automated testing of open-source models (Gemma, Qwen, Whisper, etc.).
This is a systems infrastructure position focused on the operability of AI hardware and software stacks—it is not a model training or data science research role.

Key Responsibilities -
Service Operations: Run and scale LLM services using Ray/Ray
Serve, Docker, and Linux.
Inference Management: Deploy models via vLLM/Hugging Face and
expose them through high-performance, OpenAI-compatible APIs.
Performance Tuning: Optimize GPU utilization, request batching,
latency, and throughput to maximize hardware efficiency.
Reliability Engineering: Maintain system stability through monitoring, robust failure handling, and automated recovery.
Lifecycle Management: Own the full model deployment cycle:
versioning, upgrades, benchmarking, and rollbacks.
Agent Support: Provide infrastructure support for internal teams building agents, automation, and AI workflows.

Core Skills -
Foundations: Windows, Linux, Docker, CI/CD, Networking (Load
Balancing, Routing, Service Discovery), Python, Kubernetes.
Inference Stack: vLLM, Hugging Face, Tokenization, Quantization.●
Distributed Systems: Ray Clusters, Orchestration.
Hardware Ops: CUDA, VRAM Management, Multi-GPU setups.
Backend & APIs: FastAPI, Auth, Rate Limiting, Performance Tuning.
Observability: Real-time Metrics, Logging, Dashboarding
(Grafana/Prometheus).

QA & Agent Collaboration -

Test Automation: Partner with the QA team to implement end-to-end
automated testing for LLM workflows.
Prompt Engineering: Verify stability, availability, speed and
responsiveness of different models.
Evaluation: Support "LLM-as-a-judge" frameworks and automated
evaluation pipelines to ensure output quality.
Agent Stability: Contribute to agent behavior stability via prompt
engineering and constraint enforcement.
Benchmarking: Maintain prompt suites and regression benchmarks
to catch performance dips early.
Framework Integration: Support integration with agent frameworks
like LangGraph and n8n.
Feedback Loops: Enable telemetry-driven feedback to optimize
prompts, routing, and model selection.

Note: This role is not focused on model training, fine-tuning, or
data science research. We are building the production-grade
plumbing required to make local AI systems fast, stable, and
scalable.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник -