Назад
Company hidden
42 минуты назад

Multimodal AI Model Optimization Research Engineer

Формат работы
remote (Global)/hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
UK/US
Релокация
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Multimodal AI Model Optimization Research Engineer (AI Engineering): Optimize and productionize cutting-edge multimodal AI models focusing on sparsification, distillation, and quantization. Focus on designing efficient model architectures, benchmarking trade-offs, and collaborating closely with researchers and engineers to deploy scalable AI systems.

Location: Hybrid in San Francisco with relocation support; remote candidates also considered (London/Europe acceptable)

Company

hirify.global is a Series B startup pioneering multimodal AI to enable natural human-AI interaction through audio-visual avatar behavior and conversational video experiences across multiple industries.

What you will do

  • Optimize research AI models for production using sparsification, distillation, and quantization techniques
  • Define metrics, run experiments, and benchmark latency, cost, and quality trade-offs
  • Collaborate with researchers and engineers to implement deployable AI systems
  • Manage the full optimization lifecycle of key multimodal AI models

Requirements

  • Location: Hybrid in San Francisco preferred with relocation support; remote candidates considered
  • Strong experience in deep learning with PyTorch and model optimization techniques
  • Knowledge of efficient architectures, inference performance, and GPU fundamentals
  • Proficient Python coding and research engineering skills
  • Experience with large models and datasets in cloud environments
  • Ability to read and reproduce ML research papers and communicate effectively

Nice to have

  • Experience optimizing diffusion, video/audio generative, or large language models
  • Familiarity with real-time or streaming systems such as low-latency APIs and WebRTC
  • Knowledge of TensorRT, ONNX Runtime, TVM, Triton, or XLA
  • Experience writing custom CUDA kernels and low-level performance tuning
  • Expertise in experiment tracking, benchmarking, and profiling at scale
  • Prior research engineering or applied science experience

Culture & Benefits

  • Flexible work schedules and unlimited PTO
  • Competitive healthcare and gear stipends
  • Collaborative environment focused on learning and impact
  • Emphasis on diversity and culture creation

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →