TL;DR
Ml Data Engineer (AI): Develop and maintain scalable data pipelines for large-scale unstructured image and text datasets with an accent on Kubernetes-based ingestion, preprocessing, and S3 object storage management. Focus on designing reliable, high-throughput pipelines and collaborating closely with ML engineers to improve model training data quality.
Location: London, UK with Skilled Worker visa sponsorship available
Company
hirify.global is an AI tool company based in London, UK, focused on image generation for professional designers, illustrators, and marketers with millions of users worldwide.
What you will do
- Develop and maintain data ingestion pipelines for large-scale image and text datasets from public sources.
- Manage end-to-end data flow including filtering, deduplication, validation, and preparation of training artifacts.
- Operate and improve Kubernetes-based data-pipeline framework with distributed jobs, retries, and monitoring.
- Optimize S3-style object storage for efficient data layout, lifecycle, and throughput.
- Build tooling for pipeline observability including progress visualization, metrics, and alerts.
- Collaborate closely with ML engineers to align datasets with training needs and accelerate experimentation.
Requirements
- Location: Must be able to work onsite in London, UK
- English: B2 level or higher required
- Strong Python fundamentals with clean, production-ready code.
- Solid hands-on Kubernetes experience with containers and distributed processing.
- Proven experience with unstructured data, especially images, at scale.
- Experience with S3/object storage and efficient data handling.
Nice to have
- Familiarity with ML workflows and PyTorch.
- Experience with image quality scoring, captioning, or image-to-text pipelines.
- Experience with DAG/workflow visualization or pipeline UX tooling.
- DevOps skills including Docker, CI/CD, and infrastructure automation.
Culture & Benefits
- Competitive salary and equity.
- Skilled Worker visa sponsorship in the UK for qualified candidates.
- Direct impact on model quality through pipeline development.
- Autonomy with support from experienced ML peers.
- Modern technology stack including Python, Kubernetes, and S3.
- Fast-moving environment focused on shipping well-engineered systems.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →