Назад
3 часа назад

Principal Engineer (ML Platform)

Формат работы
remote (только Europe)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Europe
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Principal Engineer (ML Platform): Build and operate the ML platform systems that train, evaluate, and production-serve generative models with an accent on reliability, scalability, performance, and resource efficiency in GPU/cloud environments. Focus on designing platform architecture across research and product workflows, improving scheduling/monitoring/debugging, and creating automation-friendly, agent-oriented tooling that reduces operational overhead.

Location: Remote (Europe)

Company

Synthesia develops an AI video platform for business and enterprise skill development.

What you will do

  • Design and improve platform systems for model training, evaluation, and production serving.
  • Build infrastructure and tooling to make ML workloads more reliable, scalable, and cost-efficient.
  • Develop internal tools and workflows that are easy to operate by humans and by agents.
  • Work on architecture for deploying, serving, and operating models across research and product environments.
  • Improve scheduling, monitoring, and debugging for GPU and cloud workloads.
  • Drive improvements across observability, automation, reliability, and developer experience.

Requirements

  • Strong experience building or operating production systems with a focus on reliability, scalability, and maintainability.
  • Systems mindset: think in terms of bottlenecks, failure modes, interfaces, resource usage, and long-term operability.
  • Hands-on experience with cloud infrastructure, Linux, and infrastructure automation.
  • Experience with Kubernetes and operating distributed workloads in production.
  • Strong coding skills, ideally in Python or similar languages used for backend systems and tooling.
  • Experience building internal platforms, developer tooling, or infrastructure abstractions used by other engineers.

Nice to have

  • Experience operating ML infrastructure or model serving systems in production.
  • Experience with observability and debugging in distributed systems.
  • Familiarity with Terraform, Datadog, GitHub Actions, or similar tools.
  • Experience building agentic or LLM-powered internal tools and workflow orchestration (e.g., Temporal).
  • Familiarity with performance optimization, scheduling, or resource allocation problems.

Culture & Benefits

  • Hands-on IC role with significant ownership and technical direction influence.
  • Close collaboration with researchers and product engineers to turn pain points into robust platform capabilities.
  • Focus on pragmatic architectural tradeoffs as the platform scales.
  • Emphasis on automation, reliability, and developer experience to reduce operational overhead.

Hiring process

  • Interviews focused on production systems thinking, ML platform architecture, and reliability/scalability tradeoffs.
  • Technical evaluation of hands-on experience with cloud, Linux, Kubernetes, and ML serving/training operations.
  • Discussion of how experience translates to building automation-friendly, agent-oriented platform tooling.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →