Senior Bioinformatics Data Engineer (Pharma)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Senior Bioinformatics Data Engineer (AWS/Python): Building and maintaining Dagster-orchestrated ingestion pipelines for genomics and clinical data with an accent on dbt transformations and lakehouse architecture. Focus on automating AI-native engineering workflows, optimizing Redshift performance, and ensuring clinical data reproducibility.
Location: United States. Remote work is supported, but candidates within commuting distance of offices are encouraged to work on a hybrid basis.
Company
is a consulting firm providing regulatory, clinical, and R&D technology solutions to empower biotech, med device, and pharmaceutical organizations.
What you will do
- Build and maintain Dagster-orchestrated ingestion pipelines for genomics vendors including IO managers and Iceberg writers.
- Develop and harden dbt Silver-to-Gold transformations, including real-data test coverage and macro consolidation.
- Implement clinical data ingestion paths (SDTM and ADaM), reconciliation logic, and subject-dimension routing.
- Deliver platform infrastructure using FastAPI endpoints, CI/CD pipelines, and Redshift performance tuning.
- Extract transformation rules from legacy R and PySpark code to reconcile against new platform implementations.
- Automate repetitive processes into workflows and guardrails to ensure high reproducibility standards.
Requirements
- 5+ years of professional experience in data engineering with shipped production pipelines on AWS (S3, ECS/Fargate, Redshift).
- AI-native engineering practice: demonstrated experience building systems around AI coding agents (e.g., Claude Code, Cursor).
- Strong proficiency in Python, SQL, dbt, and workflow orchestration tools (Dagster, Airflow, or Prefect).
- Solid understanding of lakehouse architecture patterns and schema design for complex multi-modal datasets.
- Bachelor's or Master's degree in Computer Science, Data Engineering, Bioinformatics, or a related field.
- Ability to handle PHI-adjacent clinical data under contractor policies (background check, VPN access).
Nice to have
- Direct experience with Apache Iceberg, AWS Glue Catalog, or lakehouse table formats.
- Comfort reading genomic data (VAF, HGVS, VCFs, CNV/fusion semantics).
- Familiarity with clinical data standards including SDTM, ADaM, and CDISC.
- Background in pharma, clinical research, or life sciences.
- Proficiency in R for interoperability with bioinformatics teams.
- Experience with Docker/ECS and infrastructure-as-code (CloudFormation).
Culture & Benefits
- Commitment to diversity, equity, and inclusion, providing a safe space for all employees to succeed.
- Support for remote working with optional hybrid collaboration for those near office locations.
- Personal review of all applications by the recruitment team without the use of AI screening tools.
- Guaranteed outcome communication for every applicant.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →