Member of Technical Staff, LLM Inference (AI)

220 800 - 331 200$

Формат работы

onsite

Тип работы

fulltime

Грейд

senior/principal

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Member of Technical Staff (LLM Inference): Building and maintaining tools and systems for hirify.global researchers to run models easily and efficiently, with an accent on optimizing compute efficiency on heterogeneous data centers. Focus on introducing new systems, tools, and techniques to improve model inference performance, and empowering cutting-edge research and production deployment.

Location: New York, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) of that location.

Salary: USD $188,000 – $304,200 per year (IC5, New York City) or USD $220,800 – $331,200 per year (IC6, New York City).

Company

hirify.global (MAI) is dedicated to advancing Copilot and other consumer AI products and research, owning Copilot, Bing, Edge, and AI research.

What you will do

Work alongside researchers and engineers to implement frontier AI research ideas.
Introduce new systems, tools, and techniques to improve model inference performance.
Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues.
Build tools and establish processes to enhance the team’s collective productivity.
Find ways to overcome roadblocks and deliver your work to users quickly and iteratively.

Requirements

Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
Experience with generative AI.
Experience with distributed computing.
Python and Python ecosystem expertise (e.g., uv, pybind/nanobind, FastAPI).
Experience with large scale production inference.
Experience with GPU kernel programming.
Experience benchmarking, profiling, and optimizing PyTorch generative AI models.
Familiarity with open source inference frameworks like vLLM and SGLang.

Culture & Benefits

Work in a fast-paced, design-driven product development cycle.
Join an applied research team embedded directly in hirify.global’s research org.
Contribute to the optimization of one of the largest compute fleets in the world.
Vertically integrated team, owning everything from kernels to architecture co-design and testing tools.

Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →