Principal Software Engineer (Distributed Systems, AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Software Engineer (Distributed Systems, AI): Designing and building a unified inference platform for Ads, ensuring scalability, reliability, and efficiency with an accent on GPU inference and acceleration technologies. Focus on optimizing model inference via batching, quantization, scheduling, memory management, and runtime optimization.
Location: Suzhou, China. Starting January 26, 2026, AI (MAI) employees who live within a 25-mile commute of a non-U.S., country-specific location are expected to work from the office at least four days per week.
Company
’s mission is to empower every person and every organization on the planet to achieve more.
What you will do
- Design and build a unified inference platform for Ads, ensuring scalability, reliability, and efficiency.
- Optimize model inference via batching, quantization, scheduling, memory management, runtime optimization, and other performance improvements.
- Develop, optimize, and maintain performance‑critical components for high‑throughput, low‑latency production inference, including GPU‑accelerated paths when applicable.
- Collaborate with algorithm/model teams to co‑design serving‑aware model architectures and optimizations.
- Profile and improve end‑to‑end system performance: concurrency, memory footprint, throughput, and latency.
- Provide senior technical leadership across teams; elevate engineering best practices and influence long‑term technical strategy.
Requirements
- Bachelor’s Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- 6+ years’ experience building high‑performance, large‑scale distributed systems or ML infrastructure.
- Experience building and optimizing performance‑critical production systems.
- Experience working in Ads, Search, Recommendation systems, or other large‑scale online serving systems.
Nice to have
- Master’s Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR Bachelor’s Degree in Computer Science or related technical field AND 12+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python OR equivalent experience.
- Experience with GPU inference runtimes such as TensorRT, ONNX Runtime, Triton, TRT‑LLM, or vLLM.
- Expertise in CUDA kernel development and GPU performance engineering.
- Familiarity with LLM / Transformer inference optimizations, including: sharding, tensor / KV‑cache parallelism, paged attention, continuous batching, quantization (FP8 / AWQ), and hybrid CPU–GPU orchestration.
Culture & Benefits
- Employees come together with a growth mindset, innovate to empower others, and collaborate to realize shared goals.
- Build on values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →