Senior Machine Learning Operations Engineer II (AI)

148 000 - 216 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

senior

Английский

Страна

US/Canada

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Machine Learning Operations Engineer II (AI): Designing and scaling infrastructure and automated pipelines to reliably train, deploy, and monitor ML models in production with an accent on CI/CD, distributed infrastructure, and system reliability. Focus on automating Continuous Training (CT) pipelines, optimizing GPU/CPU clusters, and implementing robust observability for data and concept drift.

Location: Remote (Must be based in the US or Canada)

Salary: $148,000 – $216,000 USD (US) / $171,500 – $201,000 CAD (Canada)

Company

hirify.global provides location sharing and safety services for families, serving nearly 100 million monthly active users globally.

What you will do

Design and manage automated CI/CD and Continuous Training (CT) pipelines for ML model development and delivery.
Containerize and scale ML models as high-availability microservices or batch processing workflows.
Establish unified logging, alerting, and monitoring to track model performance, latency, and data drift.
Provision and optimize cloud-based ML infrastructure using Infrastructure as Code (IaC) paradigms.
Collaborate with product teams to drive infrastructure adoption via SDK/API development and system maintenance.
Implement robust lineage tracking for data, code, and model artifacts to ensure compliance and reproducibility.

Requirements

5+ years of professional software engineering, DevOps, or data engineering experience.
2+ years specifically dedicated to building and maintaining MLOps infrastructure.
Strong proficiency in Python and software engineering best practices (unit testing, modular design).
Hands-on experience with Docker and Kubernetes (EKS, GKE).
Familiarity with ML lifecycle tools: MLflow, Kubeflow, SparkML, and Airflow.
Practical experience with major cloud ecosystems (AWS, GCP, or Databricks).

Nice to have

Experience implementing production feature stores (e.g., Feast, Tecton) and model registries.
Experience deploying and optimizing LLMs using frameworks like vLLM, Triton, or TGI.
Proficiency with Terraform for managing reproducible environments.
Familiarity with distributed computation engines such as Apache Spark, Ray, or Dask.
Relevant cloud or architecture certifications (e.g., AWS ML Specialty, CKA).

Culture & Benefits

Remote-first work environment with equipment and tool reimbursement.
Comprehensive medical, dental, and vision insurance (100% paid for US employees).
401(k) matching (US) and RRSP with DPSP (Canada).
Flexible PTO and 12 company-wide days off per year.
Learning and Development programs to support professional growth.
Free hirify.global Platinum Membership for the employee's circle.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →