Назад
1 день назад

Senior Software Engineer (ML Data Platform)

Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
c1
Страна
Netherlands
Релокация
Netherlands
vacancy_detail.hirify_telegram_tooltipВакансия из Telegram канала -

Мэтч & Сопровод

Покажет вашу совместимость и напишет письмо

Описание вакансии

Senior Software Engineer - ML Data Platform

Conditions

Posted Date Mar 25, 2026 Employment Type Full-time Experience Level Mid-Senior level Location Netherlands Category ** Programming ** Company **duckduckgoose ai **

Senior Software Engineer - ML Data Platform

Senior Software Engineer — ML Data Platform

Location : Delft - the Netherlands (hybrid) Type : Full-time Start : ASAP

The internet has entered an era where reality is generatable. We build the infrastructure that helps institutions distinguish real from synthetic — at scale, protecting citizens, enterprises, and governments from synthetic media fraud. Everything you see and hear online can now be manipulated — our job is to make sure people can trust what they see. As part of our forensics platform team, you’ll work on the data backbone that makes large-scale detection possible, from ingestion and versioning to training, evaluation, and production.

You’ll join a small, senior team where your work will have immediate impact, and you’ll have ownership over the systems you build.

You’ll work on technically challenging problems such as:

  • Building dataset lineage for rapidly evolving generative models
  • Tracking model-family clusters across synthetic media types
  • Designing reproducible forensic benchmarks at scale
  • Managing large-scale image/video datasets with auditable provenance
  • Creating deterministic dataset builds for research and production environments

What You’ll Drive

  • Data platform architecture: Define unified schemas, lineage, and dataset versioning for large image/video + context data. Looking to advance your Development & Programming career with relocation support? Explore Development & Programming Jobs with Relocation Packages that include comprehensive packages to help you move and settle in your new role.
  • Ingestion at scale: Build reliable pipelines from research repos, APIs, and internal generators; automate connectors and jobs.
  • Quality & governance: Implement deduplication, validation, health dashboards, and drift/coverage checks with auditable lineage.
  • Curation & access: Deliver one-command dataset builds, deterministic splits, and fast sampling tools for training/eval.
  • Performance & cost: Tune S3/object storage layouts, partitioning, and lifecycle policies for speed and spend.
  • Orchestration & ops: Productionize pipelines with CI/CD, containerization, scheduling/monitoring, and safe rollbacks.
  • Reliability & operations: Build for simplicity and observability; participate in a planned, compensated support rotation.
  • Engineering productivity: Create internal tools/CLIs, docs, and templates that make everyone faster.

Must haves

  • Strong software engineering foundation: Master’s in Computer Science, Data Engineering, or a related field.
  • Production experience: 5–8+ years building and operating data platforms for large unstructured datasets (images/video).
  • Data lifecycle ownership: Ingest → validate → catalog → version → sample/serve → monitor.
  • Pipelines & orchestration: Experience with modern schedulers (e.g., Airflow/Prefect) and containerized jobs.
  • Storage & formats: Hands-on with object storage (e.g., S3), columnar formats/partitioning, and performance tuning.
  • Versioning & lineage: Experience with dataset versioning and reproducibility (e.g., DVC/lakeFS/Delta or equivalents).
  • Quality at scale: Deduplication, schema/label checks, and automated QC gates in CI.
  • Security & privacy: IAM, access controls, and privacy-aware workflows suitable for regulated customers.
  • Domain awareness: Familiarity with digital forensics, misinformation threats, or synthetic media — and willingness to deepen expertise.
  • Flexibility: Comfortable moving between data engineering, infra, and tooling tasks when needed. Discover our full range of relocation jobs with comprehensive support packages to help you relocate and settle in your new location.
  • Mindset & delivery: Thrive in a fast-moving environment; proactive problem-solver; ship, measure, simplify.
  • Communication: Excellent written and verbal skills; explain complex ideas clearly.
  • Independence: Deliver quality work on time without constant oversight.
  • Language: Fluent in English.

Nice-to-haves

  • Streaming & events: Kafka/Kinesis or similar for near-real-time ingestion.
  • Vector search: Experience with embedding stores or similarity search at scale.
  • Synthetic data: Building pipelines to generate/stress-test rare scenarios.
  • Cloud & on-prem: Terraform/CDK, Kubernetes, and hybrid/on-prem data deployments.
  • FinOps: Cost monitoring and optimization for data workloads.
  • Technical track record: Strong GitHub, open-source contributions, publications, patents, or public talks.
  • Leadership: Mentoring and guiding technical direction.
  • Dutch language: Fluency is a plus.

Key Deliverables (First 90 Days)

  • A unified schema + catalog with key datasets onboarded, versioned, and reproducibly built via one command.
  • Automated QC gates (dedup/validation) with a red/amber/green dataset health dashboard and clear lineage. Interested in relocating to Netherlands? Check out our comprehensive Relocation Jobs in Netherlands page with detailed relocation packages and benefits.
  • Fast sampling/curation tools for the ML team, plus cost controls (storage layouts, lifecycle policies) in place.
  • Data migration: Inventory and migrate existing/legacy datasets into the new platform; reformat to the new schema, backfill metadata, validate checksums/lineage, and deprecate legacy paths with a rollback plan.

Compensation & benefits

  • Own the backbone: Define schemas, lineage, and dataset versioning used across research and production.
  • Company participation: Meaningful equity/virtual shares aligned with company growth.
  • Flexible work: Hybrid (Delft), flexible hours, minimal ceremony, async-first collaboration.
  • Data platform mandate: Real say in stack choices (orchestration, catalog, storage/layout) and time to implement them right.
  • Repro & auditability: Space to enforce deterministic builds, splits, and traceable lineage—no heroics needed.
  • Quality culture: Backing to implement dedup, drift/coverage checks, and dataset health dashboards org-wide.
  • FinOps mindset: Budget and support to balance speed, reliability, and total cost.
  • Pragmatic on-call: Planned, compensated rotation with automation-first recovery and rollback plans.
  • Growth path: IC track to Staff/Principal; opportunities to mentor and codify data standards.
  • Learning budget: Annual budget for courses/books + two data/ML-infra conferences per year.
  • Home office: Modest stipend for an ergonomic setup; commuting support (public transport or mileage).
  • Relocation + visa: Visa sponsorship and relocation support for internationals.

Join us and be part of a company committed to creating a more secure and trustworthy digital future. Apply today to become part of our mission-driven team!

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник -