Назад
Company hidden
19 часов назад

Senior Machine Learning Operations Engineer II (AI)

148 000 - 216 000$
Формат работы
remote (только USA)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US/Canada
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Machine Learning Operations Engineer II (AI): Designing and scaling infrastructure and automated pipelines to reliably train, deploy, and monitor ML models in production with an accent on CI/CD, distributed infrastructure, and system reliability. Focus on automating Continuous Training (CT) pipelines, optimizing GPU/CPU clusters, and implementing robust observability for data and concept drift.

Location: Remote (Must be based in the US or Canada)

Salary: $148,000 – $216,000 USD (US) / $171,500 – $201,000 CAD (Canada)

Company

hirify.global provides location sharing and safety services for families, serving nearly 100 million monthly active users globally.

What you will do

  • Design and manage automated CI/CD and Continuous Training (CT) pipelines for ML model development and delivery.
  • Containerize and scale ML models as high-availability microservices or batch processing workflows.
  • Establish unified logging, alerting, and monitoring to track model performance, latency, and data drift.
  • Provision and optimize cloud-based ML infrastructure using Infrastructure as Code (IaC) paradigms.
  • Collaborate with product teams to drive infrastructure adoption via SDK/API development and system maintenance.
  • Implement robust lineage tracking for data, code, and model artifacts to ensure compliance and reproducibility.

Requirements

  • 5+ years of professional software engineering, DevOps, or data engineering experience.
  • 2+ years specifically dedicated to building and maintaining MLOps infrastructure.
  • Strong proficiency in Python and software engineering best practices (unit testing, modular design).
  • Hands-on experience with Docker and Kubernetes (EKS, GKE).
  • Familiarity with ML lifecycle tools: MLflow, Kubeflow, SparkML, and Airflow.
  • Practical experience with major cloud ecosystems (AWS, GCP, or Databricks).

Nice to have

  • Experience implementing production feature stores (e.g., Feast, Tecton) and model registries.
  • Experience deploying and optimizing LLMs using frameworks like vLLM, Triton, or TGI.
  • Proficiency with Terraform for managing reproducible environments.
  • Familiarity with distributed computation engines such as Apache Spark, Ray, or Dask.
  • Relevant cloud or architecture certifications (e.g., AWS ML Specialty, CKA).

Culture & Benefits

  • Remote-first work environment with equipment and tool reimbursement.
  • Comprehensive medical, dental, and vision insurance (100% paid for US employees).
  • 401(k) matching (US) and RRSP with DPSP (Canada).
  • Flexible PTO and 12 company-wide days off per year.
  • Learning and Development programs to support professional growth.
  • Free hirify.global Platinum Membership for the employee's circle.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →