Staff Site Reliability Engineer

175 000 - 250 000$

Формат работы

onsite

Тип работы

fulltime

Грейд

senior/lead

Английский

Страна

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Site Reliability Engineer (DevOps): Own and manage internal systems infrastructure including cloud and on-prem hardware to deliver highly available, reliable, and automated systems with an accent on infrastructure automation, monitoring, and incident response. Focus on designing and operating fault-tolerant distributed systems, migrating SaaS to self-hosted solutions, and collaborating with security and product teams.

Location: Sunnyvale, CA, United States (onsite)

Salary: $175,000–$250,000 annually

Company

hirify.global is an AI robotics company developing autonomous general-purpose humanoid robots engineered for home and commercial markets, headquartered in San Jose, CA.

What you will do

Own mission-critical infrastructure supporting source configuration management, CI/CD, software distribution, and manufacturing operations.
Migrate SaaS solutions to self-hosted platforms to improve security and reliability.
Implement monitoring, alerting, and incident response plans including runbooks and post-mortems.
Automate deployment and scaling to reduce manual workload.
Collaborate with stakeholders to define infrastructure needs and Service Level Objectives.
Partner with security teams to ensure timely application of security remediations and updates.

Requirements

Location: Must be able to work onsite in Sunnyvale, CA, United States
Strong Linux/Unix systems administration and programming/scripting skills.
Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures.
Proven ability to design, deploy, and operate high-availability, fault-tolerant distributed systems.
Mastery of infrastructure as code tools such as Terraform, CloudFormation, and Ansible.
Familiarity with monitoring and alerting tools like Prometheus, Grafana, and Datadog.
Solid understanding of networking fundamentals including TCP/IP, DNS, HTTP, load balancers, and firewalls.
Experience defining Service Level Objectives, developing runbooks, and managing incident response.
Excellent communication skills and ability to work cross-functionally.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →