Эта вакансия в архиве

Посмотреть похожие вакансии ↓
Company hidden
2 месяца назад

Senior Research Engineer (Multimodal & Video Foundation Model)

Формат работы
remote (Global)
Тип работы
fulltime
Грейд
senior
Английский
c1

Описание вакансии

Текст:
/

TL;DR

Senior Research Engineer (Multimodal & Video Foundation Model): Develop and innovate advanced multimodal and video-centric AI models and architectures with an accent on scalable training pipelines, novel AI architectures, and generative video models. Focus on designing and optimizing large-scale multimodal systems, prototyping generative AI applications, and benchmarking model performance across diverse tasks.

Location: 100% remote worldwide

Company

hirify.global is a leading fintech product company pioneering blockchain-based financial solutions including the world’s most trusted stablecoin USDT, with a global remote team.

What you will do

  • Pioneer multimodal and video-centric AI research contributing to prototypes and scalable systems.
  • Design and implement novel AI architectures integrating text, visual, and audio modalities.
  • Engineer scalable training and inference pipelines optimized for large-scale multimodal datasets and distributed GPU systems.
  • Optimize data processing, model execution, and pipeline throughput for efficiency.
  • Build modular tools for preprocessing and managing multimodal data assets.
  • Collaborate cross-functionally to translate model innovations into production-grade solutions.
  • Prototype generative AI applications showcasing new multimodal foundation model capabilities.
  • Develop benchmarking tools to evaluate model performance across diverse tasks.

Requirements

  • Bachelor’s degree in Computer Science, Computer Engineering, or related field, or equivalent experience.
  • Expertise in Python and PyTorch with experience across full development pipeline from data processing to optimization.
  • Experience with large-scale text data; bonus for interleaved audio, video, image, and text data.
  • Hands-on experience developing or benchmarking LLMs, Vision Language Models, Audio Language Models, or generative video models.
  • First-author publications at leading AI conferences (CVPR, ICCV, ECCV, ICML, ICLR, NeurIPS).
  • Excellent English communication skills (C1+ required).

Nice to have

  • PhD in Computer Vision, Machine Learning, NLP, Computer Science, Applied Statistics, or related field.
  • Expertise in computer vision, video generation foundation models, and multimodal research.

Culture & Benefits

  • Fully remote work from anywhere worldwide.
  • Collaborate with a global team of top fintech and AI professionals.
  • Opportunity to work on cutting-edge AI and blockchain technologies.
  • Focus on innovation and pioneering new financial and AI solutions.