Роль хорошо определена с солидным стеком технологий, но требование к обширному опыту может ограничить пул кандидатов. Зарплата приемлема для региона, что делает это хорошей возможностью.
Высокая зарплатаЧетко определенная рольТребование на местеТребуется обширный опыт
- ONSITE Position in Dubai (4 days per week work from office)
- Fluent in Russian
- English B2 or higher
Job Content
- Design and optimize AI inference pipelines ensuring low-latency, high-throughput model serving for enterprise applications.
- Build and maintain scalable AI infrastructure supporting complex, large-scale workloads efficiently.
- Enable reliable deployment and operation of high-performance AI model serving frameworks across environments.
- Ensure effective GPU resource utilization and cost-efficient AI workload execution.
- Establish comprehensive monitoring and observability for consistent model inference performance.
- Uphold enterprise-grade security, governance, and MLOps best practices throughout the AI delivery lifecycle.
Essential Qualifications
- Bachelor or Equivalent Degree
- 7+ years total engineering or operational experience
- At least 5+ years of relevant experience in a similar role
- Experience within large and complex global enterprises defined by high availability, transaction rates, and geographical distribution
Essential Knowledge & Skills
- Deep Learning Inference: Expertise in TensorRT, vLLM, Triton, FasterTransformer.
- Model Optimization: Experience with ONNX, GGUF, quantization (FP16, INT8, FP8).
- Distributed Systems: Experience with NCCL, MPI, InfiniBand, RDMA, and multi-node GPU workloads.
- Scalable AI Serving: Hands-on experience with Triton Inference Server, vLLM, TensorFlow Serving .
- Profiling & Debugging: Familiarity with nvidia-smi, Nsight, nvprof, TensorRT Profiler.
- Cloud & On-Prem GPU Management: Experience with Kubernetes (K8s), OpenShift, GPU scheduling (Kubeflow, Ray, KServe).
- Understanding of vector databases and their applications in analytics and AI workloads.
- Proficiency in programming languages like Python, Scala, and SQL
- Experience working collaboratively on programming projects and managing the architecture of such projects.
- Advanced skills working in a Linux environment.
Nice to have
- GPU Programming: Knowledge of CUDA, cuDNN, NCCL, Tensor Cores for optimizing inference.
- Speculative Decoding & FlashAttention for LLM inference.
- Experience optimizing token streaming for chat applications.
- Experience with vector databases (Qdrant, Milvus) for RAG workloads.
Benefits
- Opportunity to work on cutting-edge technologies in a highly innovative environment
- Dynamic and friendly work environment
- Company assistance with relocation expenses
- Medical insurance
If interested, please send your CV to:
Показать контакты
or ekostina@enfint.a
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →