Data Engineer (Databricks)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Data Engineer (Databricks): Designing and building scalable data infrastructure for a Master Data Management platform with an accent on high-performance data pipelines, entity resolution, and real-time analytics. Focus on optimizing Spark workloads, implementing Delta Live Tables, and developing idempotent data flows.
Location: Medellin, Colombia / Argentina
Company
A Databricks-native platform that unifies master data, product data, and relationships into a single AI-ready foundation powered by LLMs.
What you will do
- Design and develop scalable data pipelines within the Databricks Lakehouse Platform to support entity resolution and analytical reporting.
- Implement real-time data ingestion using Delta Live Tables (DLT) and Change Data Capture (CDC) patterns.
- Optimize PySpark workloads to ensure efficient processing of large-scale datasets and reduce latency.
- Design logical and physical data models using dimensional modeling (Kimball) and manage Slowly Changing Dimensions (SCDs).
- Collaborate with AI/ML Engineers and Data Scientists to integrate new data sources for intelligent platform features.
- Advocate for and implement best practices in version control, automated testing, and pipeline monitoring.
Requirements
- 5+ years of hands-on experience building large-scale data platforms in production.
- Deep expertise with Databricks Lakehouse Platform, including Delta Lake, Unity Catalog, and Workflows.
- Proven proficiency in Apache Spark and PySpark for complex data transformations.
- Experience building real-time pipelines, specifically using Delta Live Tables (DLT).
- Strong understanding of CDC/CDF patterns and dimensional modeling (Kimball).
- Ability to design reliable, testable, and idempotent data pipelines.
Nice to have
- Experience with Entity Resolution or Master Data Management (MDM) systems.
- Experience with AWS or Azure cloud platforms for data engineering.
- Knowledge of MLOps practices and integrating pipelines with ML workflows.
- Experience with CI/CD and Infrastructure as Code (Terraform).
Culture & Benefits
- Opportunity to work on a cutting-edge, Databricks-native AI foundation.
- Highly self-directed role within a fast-paced environment.
- Collaborative work with multidisciplinary teams including AI/ML Engineers and Data Scientists.
- Global staffing support and benefits provided by .
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →