TL;DR
Member of Technical Staff (LLM Inference): Building and optimizing tools and systems for LLM inference to empower AI researchers with an accent on compute efficiency, distributed systems, and cutting-edge research deployment. Focus on optimizing generative AI architectures, debugging performance bottlenecks, and improving team productivity for production deployment.
Location: New York, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles (U.S.) of that location.
Salary: USD $220,800–$331,200 per year (for New York City metropolitan area, IC6 role).
Company
hirify.global is a newly formed organization dedicated to advancing Copilot and other consumer AI products and research, responsible for Copilot, Bing, Edge, and AI research.
What you will do
- Implement frontier AI research ideas alongside researchers and engineers.
- Introduce new systems, tools, and techniques to improve model inference performance.
- Build tools to help debug performance bottlenecks, numeric instabilities, and distributed systems issues.
- Establish processes and build tools to enhance the team’s collective productivity.
- Find ways to overcome roadblocks and deliver your work to users quickly and iteratively.
Requirements
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years of technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python.
- Understand modern generative AI architectures and how to optimize them for inference.
- Be familiar with the internals of open-source inference frameworks like vLLM and SGLang.
- Value clear communication, improving team processes, and being a supportive team player.
- Be results-oriented, have a bias toward action, and enjoy owning problems end-to-end.
- English: B2 required.
Nice to have
- Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience, OR Bachelor’s Degree and 12+ years experience.
- Experience with generative AI and distributed computing.
- Python and Python ecosystem expertise (e.g., uv, pybind/nanobind, FastAPI).
- Experience with large-scale production inference and GPU kernel programming.
- Experience benchmarking, profiling, and optimizing PyTorch generative AI models.
- Working experience and conversant with the material in the JAX scaling book.
Culture & Benefits
- Work in an applied research team embedded directly in hirify.global’s research organization.
- Joint stewardship of one of the largest compute fleets in the world.
- Opportunity to own everything from kernels to architecture co-design to distributed systems.
- Work in a fast-paced, design-driven product development cycle.
- Access to Microsoft's benefits and compensation package, with additional details available online.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →