TL;DR
Backend Engineer (Gen-AI Cloud): Architecting and building high-performance backend systems for AI solutions, with an accent on the critical API layer, intelligent load balancers, and workload schedulers. Focus on optimizing GPU resource allocation, implementing LLM orchestration, and ensuring reliability and scalability for massive-scale inference workloads.
Location: Remote or Hybrid
Salary: $120k–$280k
Company
hirify.global is building transformative AI solutions powered by high-performance infrastructure.
What you will do
- Build REST/GraphQL APIs and gateways for model inference and resource management.
- Develop smart load balancers to distribute requests based on latency, cost, and GPU availability.
- Implement token streaming, request queuing, and batching for massive-scale LLM inference workloads.
- Design GPU resource allocators using bin-packing algorithms for hardware topology optimization.
- Implement preemption, checkpointing, and multi-tenant isolation.
- Build observability stacks (Prometheus, OpenTelemetry) to monitor SLOs for throughput and latency.
Requirements
- 4–7+ years in backend/distributed systems or infrastructure engineering.
- Strong proficiency in Go, Python, or Rust.
- Deep experience in API design (REST/GraphQL).
- Proven expertise in service discovery, failover, message queues, and microservices.
- Solid production experience with AWS/GCP/Azure, Kubernetes, and Docker.
- Strong fundamentals in TCP/IP, DNS, TLS, and HTTP.
Nice to have
- Experience with GPU-accelerated workloads or HPC scheduling.
- Familiarity with LLM frameworks (vLLM, TensorRT-LLM) or Kubernetes internals.
- Experience with API gateways (Kong, NGINX) or service meshes (Istio).
- Background in high-throughput systems (100k+ req/sec).
Culture & Benefits
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →