TL;DR

Senior AI Inference Engineer (llama.cpp): Building and enhancing local AI inference engines like llama.cpp and ONNX to run efficiently on edge devices, with an accent on runtime optimization, stability, and integration. Focus on working close to the metal, enabling private and fast on-device AI without relying on cloud infrastructure.

Location: 100% Remote, working from every corner of the world

Company

%hirify_global% is a pioneering company building cutting-edge solutions in digital finance, energy, AI, and education, empowering businesses and individuals globally.

What you will do

Deploy machine learning models to edge devices using frameworks like llama.cpp, ggml, and ONNX.
Collaborate closely with researchers to assist in coding, training, and transitioning models from research to production.
Integrate AI features into existing products, enriching them with the latest advancements in machine learning.
Port and enhance inference engines for efficient execution on edge devices.
Optimize the inference runtime to ensure faster model loading, leaner execution, and strong performance across different hardware.

Requirements

Excellent programming skills in C++.
Strong experience with Llama.cpp and ggml inference engines, facilitating model deployment to specific GPU architectures.
Good understanding of deep learning concepts and model architectures.
Experience with transformers and LLMs.
Demonstrated ability to rapidly assimilate new technologies and techniques.
A degree in Computer Science, AI, Machine Learning, or a related field, complemented by a solid track record in AI R&D.

Culture & Benefits

A 100% remote team working from every corner of the world.
Opportunity to collaborate with bright minds in the fintech space.
Be part of a fast-growing, lean, and industry-leading company.
Excellent English communication skills required.

Senior AI Inference Engineer (AI)