Назад
Company hidden
4 дня назад

Member of Technical Staff (LLM Inference)

220 800 - 331 200$
Формат работы
hybrid
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Member of Technical Staff (LLM Inference): Building and optimizing tools and systems for LLM inference to empower AI researchers with an accent on compute efficiency, distributed systems, and cutting-edge research deployment. Focus on optimizing generative AI architectures, debugging performance bottlenecks, and improving team productivity for production deployment.

Location: New York, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) of that location.

Salary: USD $220,800–$331,200 per year (for New York City metropolitan area, IC6 role).

Company

hirify.global is a newly formed organization dedicated to advancing Copilot and other consumer AI products and research, responsible for Copilot, Bing, Edge, and AI research.

What you will do

  • Implement frontier AI research ideas alongside researchers and engineers.
  • Introduce new systems, tools, and techniques to improve model inference performance.
  • Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues.
  • Establish processes and build tools to enhance the team’s collective productivity.
  • Find ways to overcome roadblocks and deliver your work to users quickly and iteratively.

Requirements

  • Bachelor’s Degree in Computer Science or related technical field AND 6+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
  • Understand modern generative AI architectures and how to optimize them for inference.
  • Be familiar with the internals of open-source inference frameworks like vLLM and SGLang.
  • Value clear communication, improving team processes, and being a supportive team player.
  • Be results-oriented, have a bias toward action, and enjoy owning problems end-to-end.
  • English: B2 required.

Nice to have

  • Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience, OR Bachelor’s Degree and 12+ years experience.
  • Experience with generative AI and distributed computing.
  • Python and Python ecosystem expertise (e.g., uv, pybind/nanobind, FastAPI).
  • Experience with large-scale production inference and GPU kernel programming.
  • Experience benchmarking, profiling, and optimizing PyTorch generative AI models.
  • Working experience and conversant with the material in the JAX scaling book.

Culture & Benefits

  • Work in an applied research team embedded directly in hirify.global’s research organization.
  • Joint stewardship of one of the largest compute fleets in the world.
  • Opportunity to own everything from kernels to architecture co-design to distributed systems.
  • Work in a fast-paced, design-driven product development cycle.
  • Access to Microsoft's benefits and compensation package, with additional details available online.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...