Эта вакансия в архиве
Посмотреть похожие вакансии ↓Senior Data Engineer
Описание вакансии
TL;DR
Senior Data Engineer (AI/Medtech): Owning the data pipelines and infrastructure that turns raw data into metrics and insights, driving product decisions and research, with an accent on reliability, performance, and data quality. Focus on analyzing and tuning Spark workloads, handling breaking upstream changes, and enforcing rigorous data quality standards.
Location: Must be authorized to work in the US without visa sponsorship and be within the New York City, Los Angeles, or San Francisco metro areas. Hybrid work, expected in the office on Tuesdays and Thursdays, with potentially more frequent in-office work during the onboarding period. Relocation assistance provided to anyone who does not already reside in the NYC metro area.
Salary: $165,000 - $220,000 a year
Company
is an AI-powered Proactive Documentation platform that advances how care is delivered by reviewing all patient data in the EHR to recommend diagnoses and surface clinical evidence.
What you will do
- Collect, model, and consolidate data into the data platform to support analytics, ML development, and research initiatives.
- Design, build, and evolve data models and pipelines that reliably transform and deliver data to downstream consumers.
- Own data quality in collaboration with engineering teams, ensuring datasets are trustworthy and production-ready.
- Partner closely with product to deliver analytics and actionable insights to internal and external stakeholders.
- Own the reliability and day-to-day operation of the data platform and its pipelines through proactive monitoring, alerting, and operational management.
Requirements
- Bachelor's degree in Computer Science, Mathematics, Statistics, or a related field, or equivalent practical experience.
- 5+ years of experience in data engineering roles.
- 3+ years of experience using PySpark to build data pipelines.
- 3+ years of experience in public cloud provider technologies (AWS tooling such as S3, EMR, or Athena).
- Strong proficiency in Python and SQL.
- Willingness to participate in on-call operational support for owned systems.
Nice to have
- Experience with one or more of the following technologies: Apache Iceberg, Dagster, Clickhouse, PostgreSQL, FastAPI, Metabase.
- Experience working with healthcare data, including HIPAA compliance, data de-identification, and familiarity with open data standards such as OMOP CDM.
- Experience building and supporting data pipelines for ML workflows, including model training, validation, deployment, and ongoing performance evaluation.
Culture & Benefits
- Eligible for equity.
- 99% employer-paid health benefits (Medical, Dental, and Vision) + One Medical subscription.
- 18 PTO days/yr + 1 week holiday break.
- Monthly health & wellness budget.
- Company-sponsored team retreat + social events.
- A sabbatical program.