Назад
Company hidden
4 дня назад

Site Reliability Engineer (Metal)

100 000 - 500 000$
Формат работы
hybrid
Тип работы
fulltime
Английский
b2
Страна
US/Canada
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Site Reliability Engineer (Metal): Ensuring reliability and operational health of large-scale AI systems across internal clusters and customer deployments with an accent on observability, automation, and troubleshooting complex multi-layer issues. Focus on designing monitoring and alerting systems, building automation to reduce operational toil, and partnering with engineering teams and customers to resolve production incidents.

Hybrid, based out of Toronto, ON; Austin, TX; or Santa Clara, CA. Offer contingent upon eligibility to access U.S. export-controlled technology; restrictions apply to nationals of certain countries (EAR Country Groups D:1, E:1, E:2).

Compensation: $100k - $500k including base and variable.

Company

Leading the industry in cutting-edge AI technology with high-performance RISC-V CPU, software models, compilers, platforms, networking, and semiconductors.

What you will do

  • Ensure reliability and operational health of hirify.global systems across internal and customer environments.
  • Troubleshoot complex issues across compute, networking, and software layers.
  • Partner with engineering teams and customers to resolve production incidents.
  • Design and improve monitoring, observability, and alerting systems.
  • Build automation to reduce operational toil and improve system reliability.

Requirements

  • Experienced in site reliability, infrastructure, or systems engineering in distributed environments.
  • Strong Linux systems knowledge with ability to troubleshoot complex multi-layer issues.
  • Proficient with observability tools such as Prometheus, Grafana, and alerting systems.
  • Comfortable with scripting and automation using Python, Go, or similar languages.
  • Solid understanding of networking fundamentals and systems at scale.
  • Eligible to access U.S. export-controlled technology per EAR regulations.

Culture & Benefits

  • Highly competitive compensation package.
  • Equal opportunity employer.
  • Value collaboration, curiosity, and commitment to solving hard problems.
  • Growing team with contributors of all seniorities.

Hiring process

  • Assessed for appropriate level during interview process; offers align with assessed level.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →