Назад
Company hidden
3 дня назад

Site Reliability Engineer (SRE) (AI)

Формат работы
onsite
Тип работы
fulltime
Грейд
middle
Английский
b2
Страна
Taiwan
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (SRE) (AI): Designing and maintaining highly reliable, observable, and secure cloud and edge infrastructure supporting AI-driven products with an accent on observability, Kubernetes orchestration, and system security. Focus on automating operational tasks, managing SLOs/SLIs, and ensuring system resilience through proactive monitoring and incident response.

Location: On-site in Taipei City, Taiwan

Company

hirify.global is an innovative technology company operating large-scale cloud and edge infrastructure that powers AI-driven products and services.

What you will do

  • Design and maintain monitoring, alerting, and dashboarding systems to build visibility into system health via metrics, logs, and traces.
  • Deploy, manage, and optimize containerized workloads running on Kubernetes across production and edge environments.
  • Implement secure access controls and monitor for cybersecurity threats and service disruptions.
  • Automate repetitive operational tasks and build tooling to streamline infrastructure and CI/CD workflows.
  • Participate in on-call rotations, lead troubleshooting for production incidents, and conduct root-cause analysis.
  • Collaborate with AI, ML, hardware, and product teams to ensure new services are production-ready.

Requirements

  • 3+ years of experience in SRE, DevOps, Platform Engineering, or Production Operations.
  • Hands-on experience with AWS or other major cloud platforms.
  • Proficiency with Docker, Kubernetes, and Infrastructure as Code tools like Terraform.
  • Strong understanding of observability tools such as Grafana and Prometheus.
  • Solid Linux administration skills and proficiency in Python or Bash.
  • Must be based in or be able to work on-site in Taipei City, Taiwan

Nice to have

  • Experience operating large-scale edge computing or IoT deployments.
  • Familiarity with zero-trust access management or security operations (threat detection).
  • Exposure to AI infrastructure, LLM-based applications, or AI-Ops solutions.
  • Knowledge of compliance frameworks such as ISO 27001.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →