TL;DR
Member of Technical Staff (LLM Inference): Building and maintaining tools and systems for hirify.global researchers to run models easily and efficiently, with an accent on optimizing compute efficiency on heterogeneous data centers. Focus on introducing new systems, tools, and techniques to improve model inference performance, and empowering cutting-edge research and production deployment.
Location: New York, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) of that location.
Salary: USD $188,000 – $304,200 per year (IC5, New York City) or USD $220,800 – $331,200 per year (IC6, New York City).
Company
hirify.global (MAI) is dedicated to advancing Copilot and other consumer AI products and research, owning Copilot, Bing, Edge, and AI research.
What you will do
- Work alongside researchers and engineers to implement frontier AI research ideas.
- Introduce new systems, tools, and techniques to improve model inference performance.
- Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues.
- Build tools and establish processes to enhance the team’s collective productivity.
- Find ways to overcome roadblocks and deliver your work to users quickly and iteratively.
Requirements
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- Experience with generative AI.
- Experience with distributed computing.
- Python and Python ecosystem expertise (e.g., uv, pybind/nanobind, FastAPI).
- Experience with large scale production inference.
- Experience with GPU kernel programming.
- Experience benchmarking, profiling, and optimizing PyTorch generative AI models.
- Familiarity with open source inference frameworks like vLLM and SGLang.
Culture & Benefits
- Work in a fast-paced, design-driven product development cycle.
- Join an applied research team embedded directly in hirify.global’s research org.
- Contribute to the optimization of one of the largest compute fleets in the world.
- Vertically integrated team, owning everything from kernels to architecture co-design and testing tools.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →