Назад
Company hidden
6 дней назад

Senior Research Data Engineer (Canada)

159 100 - 176 700CAD
Формат работы
remote (только Canada)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Canada
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Senior Research Data Engineer (Canada) (AI Applied Research): Building and owning the gold data layer that powers AI model development on clinical and operational data with an accent on data semantics, point-in-time correctness, and reusable dataset pipelines. Focus on designing AI-ready datasets across modalities, automating quality/labeling/synthesis, and ensuring versioned lineage so AI researchers can iterate quickly.

Company

hirify.global develops AI-enabled technology for clinical and operational care workflows.

What you will do

  • Own the gold data layer between the silver Lakehouse data and AI work, including building, validating, documenting, and extending gold datasets.
  • Reverse-engineer data semantics by analyzing SQL queries, stored procedures, technical definitions, ingestion behavior, provenance, and clinical event sequencing.
  • Bridge semantics with applied AI needs by designing gold datasets and documentation for efficient AI-first R&D.
  • Curate datasets across modalities (structured and unstructured) for generative AI/RAG, predictive ML, and classical/statistical model-ready tables.
  • Build reusable Databricks/Spark transformations with scheduled, observable workloads to support researcher iteration.
  • Automate quality, filtering, synthesis, and labeling (weak supervision, near-duplicate detection, noise/boilerplate removal, LLM-API synthetic data) and maintain versioned snapshots and lineage.

Requirements

  • Location: Remote from Canada
  • 5+ years building production data systems, with at least 2 supporting ML/AI workloads.
  • Advanced Python, SQL, and PySpark/Databricks; expert SQL for reading complex stored procedures and reverse-engineering business logic.
  • Strong Databricks ecosystem knowledge (Delta Lake, Unity Catalog, Spark/PySpark tuning, MLflow) and dataset versioning/lineage.
  • AI domain literacy: embeddings/tokenization, feature engineering, point-in-time correctness, train/validation/test splits, data drift, and differences between classical ML and generative models.
  • Experience with regulated/sensitive data under controlled access (HIPAA or equivalent) and familiarity with de-identification concepts.

Nice to have

  • Hands-on EHR data experience in skilled nursing/long-term care/post-acute care/senior living.
  • Clinical terminology and standards (ICD-10, SNOMED CT, LOINC; HL7v2, FHIR, CCDA).
  • dbt for transformation and testing.
  • Clinical NLP/OCR/document parsing or ASR/transcript pipeline experience.
  • Experience embedded inside an AI/ML research team; Master’s degree in a relevant quantitative/CS field.

Culture & Benefits

  • Remote role with in-office events requiring travel to Mississauga and/or Salt Lake City offices for onboarding, team events, and semi-annual/annual meetings.
  • CAD base salary range: $159,100–$176,700 plus bonus and benefits (not overtime eligible).
  • Focus on enabling AI researchers by providing reusable, versioned, documented gold datasets across the R&D lifecycle.

Hiring process

  • Recruiter shares details of the total rewards package during the hiring process.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →