Назад
Company hidden
8 часов назад

MLOps Engineer (AI)

100 000 - 200 000$
Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
France/UK
Релокация
France
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

MLOps Engineer (AI): Own the infrastructure that takes trained AI models to production, including rollout pipelines, quality and latency gates, canary deployments, and dashboards. Focus on building production-safe model serving systems, debugging complex GPU and Kubernetes environments, and optimizing serving cost and reliability.

Location: Hybrid in Paris or London with relocation support available for Paris only

Salary: $100K – $200K plus equity

Company

hirify.global is an AI Safety startup building safety, reliability, and optimization layers for AI systems, processing over 100M API calls monthly and developing proprietary LLMs.

What you will do

  • Integrate new text and multimodal models into production serving paths and verify behavior under production-like traffic.
  • Build and maintain rollout pipelines for frequent model releases with smoke, quality, and performance gates.
  • Operate local and cluster GPU deployments on Kubernetes and run A/B and canary rollouts for model and config changes.
  • Build dashboards monitoring latency, throughput, GPU usage, fallback rate, and quality drift.
  • Debug production issues across the full stack including model config, tokenizer, serving API, router, queue, Kubernetes, GPU runtime, and CI jobs.
  • Optimize serving cost and reliability across mixed GPU capacity.

Requirements

  • Experience with inference serving engines like SGLang, vLLM, Dynamo, or TensorRT-LLM and understanding of request lifecycle components.
  • Solid Kubernetes GPU experience including NVIDIA device plugin, GPU scheduling, resource management, node affinity, taints, and tolerations.
  • Understanding of multi-node communication libraries, CUDA runtime, container runtime compatibility, and debugging across these layers.
  • Ability to design and implement CI/CD for model serving including versioning, smoke tests, quality regression tests, latency/throughput gates, canary rollout, and rollback.
  • Strong observability skills to define dashboards and alerts for model promotion decisions.
  • Production debugging skills across Rust to Kubernetes configurations and clear communication of engineering tradeoffs.
  • Location: Must be able to work hybrid in Paris or London with relocation support only for Paris.

Nice to have

  • Rust backend experience.
  • Experience with NCCL, UCX, NVSHMEM, RDMA, InfiniBand, RoCE, or EFA.
  • Familiarity with ClickStack, Datadog, Terraform for GPU infrastructure, DCGM exporter, Prometheus, OpenTelemetry.
  • Experience with high model rollout cadence (2–3 releases per week).

Culture & Benefits

  • Paid time off in line with local regulations regardless of work location.
  • Hybrid work from Paris with relocation package or work from London (no relocation support).
  • Comprehensive medical insurance for France-based team; UK office medical insurance in setup.
  • All necessary hardware, tools, and services provided.
  • Subscriptions covered for AI agents and IDEs.
  • Team off-sites twice a year in locations like the Alps and Saint-Tropez.

Hiring process

  • Introductory call with HR (25 minutes).
  • Take-home test task.
  • Technical interview with Head of Applied Research (60 minutes).
  • Final conversation with CEO (45 minutes).

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →