TL;DR
Site Reliability Engineer: Leading infrastructure strategy and technical direction, designing, building, and securing scalable infrastructure services with an accent on automation, observability, and incident handling. Focus on driving improvement projects, managing monitoring and logging pipelines, and evolving world-class infrastructure while ensuring utmost security and scalability.
Location: Remote - Mexico
Company
hirify.global is a global community shaping the future of work through its Virtual First model, offering enterprise-level opportunities with a startup mindset to make work more intuitive, joyful, and human.
What you will do
- Ensure the reliability, scalability, and performance of hirify.global's infrastructure and services.
- Collaborate with cross-functional teams to develop and maintain best practices for monitoring, logging, and incident response.
- Build, implement, and maintain automations and infrastructure-as-code tooling, specifically Terraform, Ansible, and GitHub Actions.
- Utilize container orchestration platforms, such as Kubernetes, Amazon ECS, and Red Hat Openshift, to manage containers at scale.
- Manage and optimize monitoring and logging pipelines using tools like Datadog and Cribl LogStream.
- Drive improvement projects related to service health and visibility and develop custom tooling in Bash, Python, and other scripting languages.
Requirements
- 5+ years of experience in site reliability engineering or similar engineering roles with hands-on coding experience.
- Strong knowledge of AWS services, including EC2, S3, RDS, R53, and Lambda, and strong Linux administration.
- Experience with monitoring and logging tools like Datadog and Cribl LogStream, and driving transformational programs for metrics and observability.
- Experience with scripting in a higher-level language (Python preferred) and developing automation with tools such as Chef, Ansible, and Terraform.
- Experience with containerization technologies like Docker and orchestration platforms like Kubernetes or Amazon ECS.
- Knowledge of LDAP, REST APIs, current Auth, GitHub, Git-based workflows, RDS databases, and network security technologies like WAF.
Nice to have
- Experience managing large-scale multi-cloud or hybrid infrastructure.
- Strong background in Infrastructure as Code (Terraform, Ansible) and GitOps workflows.
- Familiarity with Kubernetes, Docker, and serverless platforms.
- Proven track record improving observability, reliability, and incident response.
- Understanding of compliance and security frameworks (SOC2, ISO 27001, FedRAMP).
- Experience implementing Zero Trust security and access models.
Culture & Benefits
- Medical, Dental & Vision allowance, plus Retirement, Critical Illness, Life & Income Protection allowance.
- Business Travel Protection: Travel medical and accident insurance.
- Flexible PTO/Paid Time Off policy in addition to statutory holidays.
- Perks Allowance to be used on wellness, learning and development, food & groceries, and much more.
- Comprehensive parental benefits including Parental Leave, Fertility Benefits, Adoptions and Surrogacy support, and Lactation support.
- Mental health and wellness benefits.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →