5 месяцев назад
Multimodal Generative AI Researcher (LLM/VLM)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
Multimodal Generative AI Researcher (LLM/VLM): Designing and fine-tuning large-scale Vision-Language Models (VLMs) and Language Models (LLMs) for multimodal tasks across vision, language, and 3D, bridging research breakthroughs with scalable engineering. Focus on building robust training and evaluation pipelines, analyzing model performance, and publishing impactful research.
Location: Remote
Company
is a leading generative AI company focused on open-source AI models.
What you will do
- Design and fine-tune large-scale VLMs/LLMs for tasks such as visual reasoning, retrieval, 3D understanding, and embodied interaction.
- Build robust, efficient training and evaluation pipelines including data curation, distributed training, and scalable fine-tuning.
- Conduct in-depth analysis of model performance, including ablations, bias/robustness checks, and generalization studies.
- Collaborate across research, engineering, and 3D/graphics teams to bring models from prototype to production.
- Publish impactful research and help establish best practices for multimodal model adaptation.
Requirements
- PhD or equivalent experience in Machine Learning, Computer Vision, NLP, Robotics, or Computer Graphics.
- Proven track record in fine-tuning or training large-scale VLMs/LLMs for real-world downstream tasks.
- Strong engineering mindset to design, debug, and scale training systems end-to-end.
- Deep understanding of multimodal alignment and representation learning (e.g., vision–language fusion, CLIP-style pre-training, retrieval-augmented generation).
- Familiarity with recent trends, including video-language and long-context VLMs, spatio-temporal grounding, agentic multimodal reasoning, and Mixture-of-Experts (MoE) fine-tuning.
- Hands-on experience with PyTorch, DeepSpeed, Ray, and distributed or mixed-precision training.
Nice to have
- Experience integrating 3D and graphics pipelines into training workflows.
- Research or implementation experience with vision-language-action models or multimodal agents.
- Familiarity with efficient adaptation methods (e.g., LoRA, adapters, QLoRA, parameter-efficient finetuning, and distillation for edge deployment).
- Knowledge of video and 4D generation trends, latent diffusion/rectified flow, or multimodal retrieval and reasoning pipelines.
- Background in GPU optimisation, quantisation, or model compression for real-time inference.
- Open-source contributions or publication track record in top-tier ML/CV/NLP venues.
Culture & Benefits
- Work remotely, pushing the frontier of multimodal AI models.
- Collaborative environment across research, engineering, and 3D/graphics teams.
- Commitment to equal employment opportunity.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →