TL;DR
Senior Machine Learning Engineer (AI Engineering): Building and scaling next-generation world model architectures and bridging them into high-throughput training infrastructure, enabling synthetic data and simulation. Focus on designing systems to acquire, process, and curate multimodal data at scale and turning raw experience into high-quality datasets.
Location: London, United Kingdom
Company
hirify.global is the leading developer of Embodied AI technology.
What you will do
- Design and implement large-scale data acquisition, processing, and curation pipelines for high-quality datasets.
- Improve dataset quality and utility through sophisticated data analysis, debugging, and experimentation.
- Develop and scale multimodal data pipelines for ingestion, preprocessing, filtering, annotation, and storage across video, LiDAR, and telemetry modalities.
- Run systematic experiments on data ablations and composition to assess their impact on model training dynamics.
- Collaborate with ML researchers and platform engineers to ensure datasets are fit for purpose and efficiently integrated into large-scale training workflows.
- Build internal tools and workflows for dataset auditing, visualization, and versioning to streamline iteration and reproducibility.
Requirements
- Experience in ML engineering, data engineering, or applied ML roles focused on large-scale data systems.
- Proven experience building and maintaining large-scale data pipelines for machine learning.
- Strong Python fundamentals and experience with modern ML and data frameworks (e.g. PyTorch, Ray, Dask, Spark).
- Solid understanding of multimodal data (video, lidar, sensor telemetry) and its challenges in large-scale training.
- Experience defining and tracking data quality metrics, conducting dataset analysis, and driving data-informed improvements in model performance.
- Demonstrated ability to work collaboratively with ML researchers, platform engineers, and product teams in a fast-paced, experimental environment.
Nice to have
- Exposure to large-scale storage, distributed training systems, or cloud compute environments (Azure, AWS, GCP).
- Experience designing high-throughput, distributed data pipelines.
- Familiarity with data versioning, lineage, and governance tools.
- Experience in AVs, robotics, simulation, or other embodied AI domains.
- Familiarity with foundation models, generative models, or simulation-based data pipelines.
Culture & Benefits
- Shape the future of embodied AI through data.
- Tackle data challenges at unprecedented scale.
- Collaborate with world-class talent.
- Make your mark on real-world autonomy.
- Work in a high-trust, high-autonomy environment.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →