Site Reliability Engineer

Тип работы

fulltime

Английский

Страна

China/Vietnam

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Site Reliability Engineer (AWS): Design, develop, and maintain reliable, scalable AWS infrastructure using Infrastructure as Code (IaC) and automation, with an accent on observability, incident management, and deployment reliability. Focus on building monitoring/alerting and logging systems, running postmortems and root-cause analysis, and improving CI/CD and deployment automation to reduce downtime and alert fatigue.

Location: Chengdu

Company

hirify.global builds gamer-centric products and operates a global team across multiple continents.

What you will do

Build and maintain Infrastructure as Code (IaC) using Terraform or AWS CloudFormation.
Operate and troubleshoot AWS-based infrastructure (compute, containers, networking, storage, databases, messaging).
Own monitoring, alerting, and logging (e.g., CloudWatch, Prometheus, Grafana, ELK) and apply AIOps for predictive alerting and anomaly detection.
Handle incidents: on-call support, incident management, postmortems, root cause analysis, and continuous improvement.
Improve CI/CD pipelines and deployment automation, including zero-downtime and blue/green or canary deployments.
Collaborate on reliability, scalability, security, performance, and cost-efficiency; automate operations to reduce manual toil.

Requirements

Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or related field.
Minimum 3 years of experience in SRE, DevOps, cloud infrastructure, or system administration.
Hands-on AWS expertise across EC2/Lambda/ECS/EKS, Auto Scaling, VPC/Route 53/Security Groups, RDS/ElastiCache/Athena/S3, and SQS/SES.
Strong IaC experience with Terraform and/or AWS CloudFormation.
Proficiency in at least one scripting/programming language: Python, Node.js, Bash, or Ruby.
Experience with Linux/Windows and container-based environments, distributed systems, and monitoring/incident management processes.

Culture & Benefits

Global mission with a team distributed across 5 continents.
Inclusive, equal-opportunity workplace with accommodations where needed.
Gamer-centric culture and emphasis on accelerated personal and professional growth.

Hiring process

Application review followed by interview steps to assess SRE/AWS and reliability practices.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →