Senior Research Engineer (Multimodal & Video Foundation Model)

Формат работы

remote (Global)

Тип работы

fulltime

Грейд

senior

Английский

Описание вакансии

Текст:

TL;DR

Senior Research Engineer (Multimodal & Video Foundation Model): Develop and innovate advanced multimodal and video-centric AI models and architectures with an accent on scalable training pipelines, novel AI architectures, and generative video models. Focus on designing and optimizing large-scale multimodal systems, prototyping generative AI applications, and benchmarking model performance across diverse tasks.

Location: 100% remote worldwide

Company

hirify.global is a leading fintech product company pioneering blockchain-based financial solutions including the world’s most trusted stablecoin USDT, with a global remote team.

What you will do

Pioneer multimodal and video-centric AI research contributing to prototypes and scalable systems.
Design and implement novel AI architectures integrating text, visual, and audio modalities.
Engineer scalable training and inference pipelines optimized for large-scale multimodal datasets and distributed GPU systems.
Optimize data processing, model execution, and pipeline throughput for efficiency.
Build modular tools for preprocessing and managing multimodal data assets.
Collaborate cross-functionally to translate model innovations into production-grade solutions.
Prototype generative AI applications showcasing new multimodal foundation model capabilities.
Develop benchmarking tools to evaluate model performance across diverse tasks.

Requirements

Bachelor’s degree in Computer Science, Computer Engineering, or related field, or equivalent experience.
Expertise in Python and PyTorch with experience across full development pipeline from data processing to optimization.
Experience with large-scale text data; bonus for interleaved audio, video, image, and text data.
Hands-on experience developing or benchmarking LLMs, Vision Language Models, Audio Language Models, or generative video models.
First-author publications at leading AI conferences (CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS).
Excellent English communication skills (C1+ required).

Nice to have

PhD in Computer Vision, Machine Learning, NLP, Computer Science, Applied Statistics, or related field.
Expertise in computer vision, video generation foundation models, and multimodal research.

Culture & Benefits

Fully remote work from anywhere worldwide.
Collaborate with a global team of top fintech and AI professionals.
Opportunity to work on cutting-edge AI and blockchain technologies.
Focus on innovation and pioneering new financial and AI solutions.