TL;DR
Senior Software Engineer (AI): Building and operating a core platform for large-scale AI and Document Understanding products with an accent on distributed systems, high-throughput model serving, and complex asynchronous training workflows. Focus on solving hard concurrency, performance, and distributed systems problems within a Rust-based infrastructure.
Location: London
Company
hirify.global creates category-leading enterprise software that leverages the transformative power of automation.
What you will do
- Design, build, and operate the core Machine Learning Services (MLS) platform, including a Rust-based API gateway, Python ML compute workers, and a distributed job queue.
- Solve hard concurrency, performance, and distributed systems problems to ensure the platform is robust for high-volume production workloads.
- Collaborate with product and ML science teams to build scalable infrastructure for various ML models, from GenAI to specialized classifiers.
- Develop a custom-built, content-addressable storage abstraction layer over cloud object stores (GCS, S3, Azure Blob) with garbage collection and sharding logic.
- Enhance an asynchronous job-queueing system, built from scratch on the storage layer using compare-and-swap primitives for atomicity.
- Dive deep into the entire stack, from Kubernetes and container orchestration to gRPC-based service communication and ONNX-based inference performance tuning on GPU-accelerated hardware.
Requirements
- 5+ years of experience engineering and architecting large-scale, distributed commercial services.
- Deep proficiency in a systems-level language (Rust, C++, Go) with a willingness to become an expert in Rust.
- Strong Python skills are critical.
- Real-world experience with cloud ecosystems (Azure, AWS, or GCP) and containerization (Docker, Kubernetes).
- A firm grasp of concurrency, multithreading, and asynchronous programming.
- Pragmatic understanding of computer science fundamentals, focusing on real-world problem-solving with data structures and algorithms.
- Ability to articulate opinions on good code and architecture, contributing to continuous improvement.
- English: B2 required
Nice to have
- Experience with Rust in a production environment.
- Experience with MLOps, particularly managing model lifecycle in a multi-tenant, high-availability system.
- Familiarity with building ML inference services, model serialization (ONNX), and GPU programming (CUDA).
- Experience building or working on custom storage or job-queueing systems.
Culture & Benefits
- Be part of a team committed to creating category-leading enterprise software in automation.
- Work with curious, self-propelled, generous, and genuine individuals in a fast-moving, fast-thinking growth company.
- Contribute to a culture that values simplicity, correctness, and peer review.
- Join a diverse and inclusive workplace that provides equal opportunities to all persons.
- Benefit from flexibility in when and where work gets done, depending on team needs (general company policy).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →