Назад
Company hidden
4 месяца назад

Staff Site Reliability Engineer

175 000 - 250 000$
Формат работы
onsite
Тип работы
fulltime
Грейд
senior/lead
Английский
b2
Страна
US
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Site Reliability Engineer (DevOps): Own and manage internal systems infrastructure including cloud and on-prem hardware to deliver highly available, reliable, and automated systems with an accent on infrastructure automation, monitoring, and incident response. Focus on designing and operating fault-tolerant distributed systems, migrating SaaS to self-hosted solutions, and collaborating with security and product teams.

Location: Sunnyvale, CA, United States (onsite)

Salary: $175,000–$250,000 annually

Company

hirify.global is an AI robotics company developing autonomous general-purpose humanoid robots engineered for home and commercial markets, headquartered in San Jose, CA.

What you will do

  • Own mission-critical infrastructure supporting source configuration management, CI/CD, software distribution, and manufacturing operations.
  • Migrate SaaS solutions to self-hosted platforms to improve security and reliability.
  • Implement monitoring, alerting, and incident response plans including runbooks and post-mortems.
  • Automate deployment and scaling to reduce manual workload.
  • Collaborate with stakeholders to define infrastructure needs and Service Level Objectives.
  • Partner with security teams to ensure timely application of security remediations and updates.

Requirements

  • Location: Must be able to work onsite in Sunnyvale, CA, United States
  • Strong Linux/Unix systems administration and programming/scripting skills.
  • Extensive experience with cloud platforms (Azure, AWS, GCP) and on-prem hardware architectures.
  • Proven ability to design, deploy, and operate high-availability, fault-tolerant distributed systems.
  • Mastery of infrastructure as code tools such as Terraform, CloudFormation, and Ansible.
  • Familiarity with monitoring and alerting tools like Prometheus, Grafana, and Datadog.
  • Solid understanding of networking fundamentals including TCP/IP, DNS, HTTP, load balancers, and firewalls.
  • Experience defining Service Level Objectives, developing runbooks, and managing incident response.
  • Excellent communication skills and ability to work cross-functionally.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Текст вакансии взят без изменений

Источник - загрузка...