TL;DR
Senior ML Engineer (AI): Building a high-performance inference and fine-tuning platform for foundation models with an accent on maximizing throughput, minimizing latency, and optimizing cost-per-token across tens of thousands of GPUs. Focus on identifying LLM inference bottlenecks, implementing novel speculative decoding architectures, and productionizing low-precision training and inference pipelines.
Location: Remote - Europe, with R&D hubs in Amsterdam, Berlin, London, Prague, and Israel.
Company
hirify.global is a cloud computing company leading the AI economy by creating tools and resources for AI/ML solutions, without massive infrastructure costs or the need for large in-house AI/ML teams.
What you will do
- Optimize LLM inference to achieve production speedups and maximum performance for various LLM architectures at scale.
- Implement novel speculative decoding architectures and contribute to open-source inference engines.
- Design and productionize low-precision training and inference pipelines (FP8, NVFP4/MXFP4) with measurable gains.
- Profile GPU workloads to identify bottlenecks and drive performance improvements.
- Contribute to building a high-performance inference and fine-tuning platform designed to push foundation models to their hardware limits.
Requirements
- Profound understanding of theoretical machine learning foundations and transformer architecture.
- Experience profiling GPU workloads using Nsight, PyTorch profiler, or similar tools.
- Understanding of GPU memory hierarchy and compute/memory tradeoffs.
- Familiarity with important ideas in LLM space, such as MHA, RoPE, KV-cache, Flash Attention, and quantisation.
- Understanding of performance aspects of large neural network training (sharding strategies, custom kernels, hardware features).
- Strong software engineering skills, primarily in Python.
- Deep experience with modern deep learning frameworks.
- Proficiency in contemporary software engineering approaches, including CI/CD, version control, and unit testing.
- Strong communication and leadership abilities.
Nice to have
- Experience working with open-source inference engines (vLLM, SGLang, TensorRT-LLM), including contributions.
- Experience with kernel languages or DSLs such as Triton, Cute, CUTLASS, CUDA.
- Track record of building and delivering products in a dynamic startup-like environment.
- Strong engineering skills, including experience in developing large distributed systems or high-load web services.
- Open-source projects that showcase your engineering prowess.
- Excellent command of the English language.
Culture & Benefits
- Competitive salary and comprehensive benefits package.
- Opportunities for professional growth within hirify.global.
- Flexible working arrangements.
- A dynamic and collaborative work environment that values initiative and innovation.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →