Member of Technical Staff (LLM Inference)

220 800 - 331 200$

Формат работы

hybrid

Тип работы

fulltime

Грейд

senior

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Member of Technical Staff (LLM Inference): Building and optimizing tools and systems for LLM inference to empower AI researchers with an accent on compute efficiency, distributed systems, and cutting-edge research deployment. Focus on optimizing generative AI architectures, debugging performance bottlenecks, and improving team productivity for production deployment.

Location: New York, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) of that location.

Salary: USD $220,800–$331,200 per year (for New York City metropolitan area, IC6 role).

Company

hirify.global is a newly formed organization dedicated to advancing Copilot and other consumer AI products and research, responsible for Copilot, Bing, Edge, and AI research.

What you will do

Implement frontier AI research ideas alongside researchers and engineers.
Introduce new systems, tools, and techniques to improve model inference performance.
Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues.
Establish processes and build tools to enhance the team’s collective productivity.
Find ways to overcome roadblocks and deliver your work to users quickly and iteratively.

Requirements

Bachelor’s Degree in Computer Science or related technical field AND 6+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
Understand modern generative AI architectures and how to optimize them for inference.
Be familiar with the internals of open-source inference frameworks like vLLM and SGLang.
Value clear communication, improving team processes, and being a supportive team player.
Be results-oriented, have a bias toward action, and enjoy owning problems end-to-end.
English: B2 required.

Nice to have

Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience, OR Bachelor’s Degree and 12+ years experience.
Experience with generative AI and distributed computing.
Python and Python ecosystem expertise (e.g., uv, pybind/nanobind, FastAPI).
Experience with large-scale production inference and GPU kernel programming.
Experience benchmarking, profiling, and optimizing PyTorch generative AI models.
Working experience and conversant with the material in the JAX scaling book.

Culture & Benefits

Work in an applied research team embedded directly in hirify.global’s research organization.
Joint stewardship of one of the largest compute fleets in the world.
Opportunity to own everything from kernels to architecture co-design to distributed systems.
Work in a fast-paced, design-driven product development cycle.
Access to Microsoft's benefits and compensation package, with additional details available online.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...