TL;DR
Principal SRE Engineer (AI): Architecting, building, and scaling hirify.global’s next-generation infrastructure and AI-powered capabilities with an accent on resilient cloud architectures, robust automation, and integrating AI systems into production environments. Focus on strengthening reliability, observability, and security while mentoring engineering teams and championing DevOps standards.
Location: Fully remote for most hiring locations. Candidates based in Singapore or Australia will be required to work from the office 2–3 days per week (hybrid format).
Company
hirify.global is a fast-growing tech-enabled unicorn that provides back-office services for micro SMEs through proprietary software and AI, operating across corporate secretary, accounting, and FinTech payment segments.
What you will do
- Lead the architecture, building, and scaling of hirify.global’s next-generation infrastructure and AI-powered capabilities.
- Define infrastructure strategy, design resilient cloud architectures, and ensure platform security, scalability, and performance.
- Integrate AI systems into production environments for efficient delivery, observability, and reliability.
- Drive robust automation across CI/CD, infrastructure provisioning, and operations to enhance reliability and reduce manual overhead.
- Strengthen system reliability, incident response, SLIs/SLOs, and operational excellence through monitoring, performance tuning, and capacity planning.
- Provide technical leadership and mentorship to elevate platform engineering and DevOps maturity across the organization.
Requirements
- 8+ years of progressive experience in Site Reliability Engineering (SRE).
- 5+ years of hands-on experience across multi-cloud environments (AWS, GCP, Azure) with expertise in networking, compute, storage, security, and cost optimization.
- 5+ years of deep expertise in containerization and orchestration (e.g., Kubernetes, EKS, ECS).
- 8+ years of extensive experience with Infrastructure as Code (IaC) (e.g., Terraform, Pulumi, CloudFormation).
- Proven ability to design, build, and operate highly reliable, scalable production systems using advanced Zero-Downtime Deployment Patterns.
- Experience supporting or deploying AI/ML workloads (e.g., model inference, vector databases, GPU workloads).
- Excellent communication and collaboration skills with a proven ability to mentor engineers.
Nice to have
- Expertise in modernizing deployments via GitOps practices (e.g., ArgoCD, Flux) and building Self-Service Developer Platforms.
- Experience implementing and managing Multi-Cloud API Gateways and Edge Routing solutions.
- Strong background in platform security, including secrets management, IAM, and Runtime/Security Hardening.
- Familiarity with modern programming languages like Node.js, NestJS, and Python.
Culture & Benefits
- Culture of humility, kindness, diversity, and inclusion.
- Flexible work environment: fully remote for most locations; hybrid (2-3 days in office) for Singapore or Australia based employees.
- Opportunity to work fully remote from anywhere in the world for 1 month each year.
- Competitive market salaries, generous paid time off, and potential employee share ownership plan.
- Significant responsibility, autonomy, and personal growth opportunities with internal and external training programs.
- Company is a certified B Corp, committed to sustainability and social impact.
Hiring process
- Screening call (~30 min) with Talent Acquisition.
- 2-3 hour take-home SRE assessment.
- ~90 minute technical interview with Delivery Lead and/or Head of Engineering.
- Final interview with CTO.
- Background screening required due to being a regulated entity (education, criminal history, political exposure, credit history).
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →