TL;DR
Site Reliability Engineer (SRE): Designing and implementing automated, highly available, and scalable cloud infrastructure for a data platform and AI workloads with an accent on observability, database performance, and incident response. Focus on optimizing machine learning model deployment, supporting high-traffic multi-tenant systems, and ensuring robust security and compliance.
Location: Remote (Global). Company hubs are in France, the UK, and the US.
Company
hirify.global is a startup empowering companies with technology to manage their climate impact and make a meaningful contribution to a better future.
What you will do
- Design, implement, and maintain highly available, scalable, and secure cloud infrastructure for the hirify.global Data platform and AI workloads.
- Improve and expand observability strategy using Datadog for Rails applications and AI workloads.
- Develop scalable infrastructure to support machine learning model training, deployment, and monitoring.
- Support critical infrastructure scaling projects and contribute to high-traffic systems design.
- Manage day-to-day operations, including on-call duties, capacity planning, and proactive system health monitoring.
- Implement security measures, data protection protocols, and ensure strong compliance (SOC 2 Type 2, ISO 27001).
Requirements
- Engineering degree or 3+ years of DevOps/SRE experience, with strong candidates at 5+ years preferred.
- Strong knowledge of AWS (ECS/Fargate), Docker, Terraform, and PostgreSQL at scale (sharding, clustering, high-volume scenarios preferred).
- Datadog expertise strongly preferred.
- Experience with continuous integration and continuous deployment, high-traffic multi-tenant systems, and database scaling strategies.
- Strong operational mindset with experience in on-call rotations and production incident management.
- Experience improving observability and monitoring systems.
- English: C1+ required (fluent).
Nice to have
- Ruby on Rails experience.
- Snowflake experience.
- Change Data Capture and data pipeline experience.
- Familiarity with ARC (Actions Runner Controller) and Kubernetes.
Culture & Benefits
- Opportunity to join an exciting startup with a vision to change the world by managing climate impact.
- Flexible work model to balance personal and professional commitments.
- Committed to fostering a connected and engaged remote work culture with global colleagues.
- Part of a B Corporation dedicated to creating successful businesses that benefit society and the planet.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →