5 часов назад

Senior or Staff ML Systems Engineer (LLM)

200 000 - 275 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Страна

US/Canada

Вакансия из Telegram канала -

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

Senior or Staff ML Systems Engineer, LLMs

Company

TRM Labs

Conditions

1 day agoLeadSalary: 200K - 275KNorth America Remote Full Time Ai Jobs by TRM Labs

Skills

Tracing Agent Evaluation Vector Database Feature Store Langchain Terraform Monitoring Infrastructure Observability Ci/Cd Mlops Llm Model Versioning Model Registry Llamaindex Vllm Bentoml Triton Drift Detection Python Docker Kubernetes

About the Role

You will build and scale the technical infrastructure that powers large language models and agentic systems. You will create reusable CI/CD workflows for model training, evaluation, and deployment, automate model versioning and approval workflows, and implement compliance checks. You will design and operate modular AI infrastructure—vector databases, feature stores, model registries, and observability tooling—and embed models and agents into real-time applications. You will continuously evaluate and integrate state-of-the-art tools, monitor cost, latency, and performance, and run offline and online evaluation pipelines including regression tests and human-in-the-loop workflows. You will enable researchers by providing sandboxes, dashboards, and reproducible environments, and ensure data accuracy and reliability for model training and inference.

Requirements

Write high quality maintainable software primarily in Python
Strong background in scalable infrastructure including containerization and orchestration (Docker Kubernetes)
Experience with infrastructure as code and deployment (Terraform CI/CD pipelines)
Familiarity with monitoring and logging frameworks (Datadog Prometheus OpenTelemetry)
Knowledge of MLOps best practices including model versioning rollback strategies automated evaluation and drift detection
Experience with scalable model and agent serving infrastructure (vLLM Triton BentoML)
Experience deploying and maintaining LLM and agentic workflows in production including monitoring cost latency and performance
Ability to capture traces for analysis and optimize prompt response flows with real time data access
Strong ownership pragmatism and ability to balance infrastructure elegance with iterative delivery

Responsibilities

Build reusable CI/CD workflows for model training evaluation and deployment
Automate model versioning approval workflows and compliance checks
Design and maintain modular scalable AI infrastructure including vector databases feature stores model registries and observability tooling
Embed AI models and agents into real time applications and workflows
Evaluate and integrate state of the art AI tools and libraries
Drive AI reliability governance and ensure compliance security and uptime
Deploy infrastructure for offline and online evaluation including regression testing cost monitoring and human in the loop workflows
Provide sandboxes dashboards and reproducible environments to accelerate research
Ensure data accuracy consistency and reliability for model training and inferencing

Benefits

Equity plan
Remote work

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник -

Senior or Staff ML Systems Engineer (LLM)

Мэтч & Сопровод

Описание вакансии

Senior or Staff ML Systems Engineer, LLMs

Company

Conditions

Skills

About the Role

Requirements

Responsibilities

Benefits

Похожие вакансии

Lead ML/AI Engineer (MLOps)

Senior AI/ML Engineer (GenAI)

Staff AI Engineer (AI Platform)

Senior MLOps Engineer (AI)

Staff Software Engineer, Intelligence Tooling and Enablement (AI)

AI/ML Engineer (Finance)