Назад
Company hidden
6 часов назад

Staff Site Reliability Engineer

Формат работы
remote (только Argentina)
Тип работы
fulltime
Грейд
senior
Английский
b2
Страна
Argentina
Вакансия из списка Hirify.GlobalВакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:
/

TL;DR

Staff Site Reliability Engineer (SRE/AI): Lead development of AI-assisted reliability tooling for analyzing tickets, logs, traces to resolve outages faster with an accent on observability, incident response, and automation. Focus on defining SLO/SLI frameworks, scaling cloud operations for SaaS, and mentoring engineers on SRE practices.

Remote Argentina

Company

Domino builds a platform for AI-driven organizations to develop and operate data science and AI solutions at scale, serving customers like Johnson & Johnson, GSK, and UBS.

What you will do

  • Lead development of internal AI-assisted reliability tooling to analyze tickets, logs, traces, and documentation for faster outage resolution.
  • Improve observability coverage and signal quality for critical customer-facing systems.
  • Own end-to-end incident response, documentation, and prevention of recurrence.
  • Guide development of customer-facing observability tools in products.
  • Define and mature SLO/SLI frameworks for priority services.
  • Scale cloud operations for single-tenant SaaS and improve deployment reliability.
  • Mentor engineers and shape SRE practices, workflows, and culture.

Requirements

  • Deep experience in SRE, platform engineering, or software engineering with operational ownership.
  • Fluency with Kubernetes, Linux, cloud platforms, and observability tooling for production problem-solving.
  • Strong ability to identify and close reliability gaps in products, tools, and processes.
  • Strong software engineering skills in Python or Go for building reliable internal tools.
  • Comfort leading ambiguous technical work and influencing teams without authority.
  • History of improving reliability through engineering and automation.
  • Strong communication and mentoring experience in technical decision-making.
  • Sound judgment on AI/LLM tooling in operational workflows.

Nice to have

  • Experience with LLM-based systems, retrieval workflows, SaaS operations, or support/developer tooling.

Culture & Benefits

  • Growing a diverse team with people from all backgrounds encouraged to apply.
  • Value growth mindset, creative problem-solving, and seeking opportunities for success.
  • Emphasize truth-seeking, authenticity, and continuous improvement in all areas.
  • Environment of teaching and learning to equip employees for success.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →