Назад
Company hidden
4 дня назад

Senior Site Reliability Engineer (AI)

Π€ΠΎΡ€ΠΌΠ°Ρ‚ Ρ€Π°Π±ΠΎΡ‚Ρ‹
onsite
Π’ΠΈΠΏ Ρ€Π°Π±ΠΎΡ‚Ρ‹
fulltime
Π“Ρ€Π΅ΠΉΠ΄
senior
Английский
b2
Π‘Ρ‚Ρ€Π°Π½Π°
Singapore
Вакансия ΠΈΠ· списка Hirify.GlobalВакансия ΠΈΠ· Hirify Global, списка ΠΌΠ΅ΠΆΠ΄ΡƒΠ½Π°Ρ€ΠΎΠ΄Π½Ρ‹Ρ… tech-ΠΊΠΎΠΌΠΏΠ°Π½ΠΈΠΉ
Для мэтча ΠΈ ΠΎΡ‚ΠΊΠ»ΠΈΠΊΠ° Π½ΡƒΠΆΠ΅Π½ Plus

ΠœΡΡ‚Ρ‡ & Π‘ΠΎΠΏΡ€ΠΎΠ²ΠΎΠ΄

Для мэтча с этой вакансиСй Π½ΡƒΠΆΠ΅Π½ Plus

ОписаниС вакансии

ВСкст:
/

TL;DR

Senior Site Reliability Engineer (AI): Ensuring the reliability, performance, and scalability of AI products, model-serving infrastructure, and backend API systems with an accent on automating operations and enhancing observability. Focus on building resilient systems, solving complex infrastructure problems, and supporting AI workloads in production.

Location: Role based in Singapore office and may require up to 1 travel trip per year.

Company

hirify.global is on a global mission to revolutionize the way the world games.

What you will do

  • Administer, monitor, and manage cloud-scale production environments for AI model APIs, backend services, and high-traffic web systems serving global users.
  • Design and implement fault-tolerant, autoscaling cloud architectures tailored for AI inference workloads, including GPU-based environments and software products.
  • Build automated self-recovery systems to ensure high availability, rapid failover, and cost-efficient resource usage for all software products.
  • Manage and monitor AI model-serving platforms, inference engines, vector databases, data pipelines, software applications.
  • Implement and maintain comprehensive monitoring, logging, and alerting for all AI and backend services.
  • Work closely with software engineering, ML engineering, and release management to enhance operational procedures, deployment processes, and incident response workflows.

Requirements

  • 5+ years of relevant experience in SRE, DevOps, infrastructure engineering, or cloud operations.
  • Experience operating production services with significant availability or scaling demands.
  • Strong knowledge in Web Technologies such as HTTP, REST, SSL, Load Balancers, Web Proxies (NGINX).
  • Comfortable with Linux and Docker administration.
  • Basic knowledge in AWS, CI/CD (Jenkins), IaC (Terraform), Container Orchestration (AWS ECS or K8s), Version Control (Git), Database (mySQL, noSQL).
  • Strong ability to code and script (preferably Bash scripting and Python).
  • Must have good analytical skills to debug deployment problems without taking help from developers.
  • Has a Bachelor’s or Master’s degree in computer science, AI or similar discipline from an accredited institution.

Culture & Benefits

  • Opportunity to make an impact globally while working across a global team located across 5 continents.
  • Gamer-centric #LifeAthirify.global experience that will put you in an accelerated growth, both personally and professionally.
  • Inclusive, respectful, and fair workplace for every employee across all the countries we operate in.

Π‘ΡƒΠ΄ΡŒΡ‚Π΅ остороТны: Ссли Ρ€Π°Π±ΠΎΡ‚ΠΎΠ΄Π°Ρ‚Π΅Π»ΡŒ просит Π²ΠΎΠΉΡ‚ΠΈ Π² ΠΈΡ… систСму, ΠΈΡΠΏΠΎΠ»ΡŒΠ·ΡƒΡ iCloud/Google, ΠΏΡ€ΠΈΡΠ»Π°Ρ‚ΡŒ ΠΊΠΎΠ΄/ΠΏΠ°Ρ€ΠΎΠ»ΡŒ, Π·Π°ΠΏΡƒΡΡ‚ΠΈΡ‚ΡŒ ΠΊΠΎΠ΄/ПО, Π½Π΅ Π΄Π΅Π»Π°ΠΉΡ‚Π΅ этого - это мошСнники. ΠžΠ±ΡΠ·Π°Ρ‚Π΅Π»ΡŒΠ½ΠΎ ΠΆΠΌΠΈΡ‚Π΅ "ΠŸΠΎΠΆΠ°Π»ΠΎΠ²Π°Ρ‚ΡŒΡΡ" ΠΈΠ»ΠΈ ΠΏΠΈΡˆΠΈΡ‚Π΅ Π² ΠΏΠΎΠ΄Π΄Π΅Ρ€ΠΆΠΊΡƒ. ΠŸΠΎΠ΄Ρ€ΠΎΠ±Π½Π΅Π΅ Π² Π³Π°ΠΉΠ΄Π΅ β†’