TL;DR
AI Inference Engineer: Developing and optimizing APIs and systems for real-time AI model inference with an accent on large-scale deployment, benchmarking, and reliability. Focus on implementing LLM inference optimizations, GPU kernel programming, and improving system observability.
Location: London, United Kingdom
Company
hirify.global is a product company specializing in AI and software development.
What you will do
- Develop APIs for AI inference used by internal and external customers
- Benchmark and address bottlenecks in the inference stack
- Improve system reliability and observability, respond to outages
- Explore and implement novel LLM inference optimizations
Requirements
- Experience with ML systems and deep learning frameworks such as PyTorch, TensorFlow, ONNX
- Familiarity with LLM architectures and inference optimization techniques like continuous batching and quantization
- Understanding of GPU architectures or experience with CUDA kernel programming
- Location: Must be based in London or able to work onsite
- English: B2 level or higher required
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →