Senior Site Reliability Engineer (Cloud)

Формат работы

remote (только Canada)

Тип работы

fulltime

Грейд

senior

Английский

Страна

Canada

Вакансия из Hirify Global, списка международных tech-компаний
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Senior Site Reliability Engineer (Cloud): Designing and scaling fault-tolerant infrastructure for a global e-commerce shipping API with an accent on Kubernetes, cloud orchestration, and system reliability. Focus on automating CI/CD pipelines, implementing disaster recovery solutions, and optimizing high-availability distributed systems.

Location: Remote (Canada)

Company

hirify.global is the shipping layer of the internet, providing logistics technology and infrastructure to connect merchants with carriers worldwide via a single API.

What you will do

Design, scale, and secure infrastructure through fault-tolerant architecture, performance tuning, and capacity planning.
Build and maintain automation, monitoring, and alerting systems, including disaster recovery solutions.
Ensure scalability and maintainability through microservices adoption and decoupling of concerns.
Enhance and maintain CI/CD pipelines to ensure smooth and safe production releases.
Verify system performance and correctness regarding response time and throughput.
Participate in on-call rotations and collaborate on peer design reviews for new features.

Requirements

Experience developing and troubleshooting highly available distributed systems, specifically with Kubernetes in production.
Extensive expertise with at least one public cloud provider (AWS, GCP, or Azure).
Exceptional verbal, written, and interpersonal communication skills.
Strong understanding of security practices, automation, and testing methods.
Familiarity with Redis, Elasticsearch, and Hadoop.
BS or MS degree in Computer Science or equivalent professional experience.

Nice to have

Advanced knowledge of Postgresql server configuration and optimization.
3+ years of professional software development experience.
Experience managing service meshes (e.g., Istio) and monitoring SLOs/SLAs.
Proficiency with monitoring tools such as New Relic, Prometheus, Grafana, or Datadog.
Knowledge of OpenTelemetry for distributed tracing and metrics collection.
Experience managing Python and Golang applications in production.

Culture & Benefits

Remote-first and globally distributed team environment.
Culture based on flexibility, trust, and autonomy.
Commitment to inclusivity and equal access to opportunities for all backgrounds.
Modern, scalable technology stack.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →