TL;DR
Senior Site Reliability Engineer: Owning end-to-end reliability and performance of critical systems, designing and refining the platform for application and AI features with an accent on improving observability, database performance, Kubernetes, and CI/CD pipelines. Focus on building and refining the platform for applications and AI features, solving complex concurrency challenges, and ensuring system reliability.
Location: Remote - USA
Company
hirify.global is a high-growth, Series B software company creating the Interview Intelligence category with clients including Canva, OpenAI, Ramp, and Hubspot.
What you will do
- Improve and iterate on the observability stack, including Kibana, Grafana, OTel, and Elastic.
- Optimize database performance by analyzing slow queries, tuning indexes, and recommending schema and code changes.
- Enhance Kubernetes deployments, resource utilization, and security, standardizing deployment patterns.
- Improve CI/CD pipelines for fast and safe backend and frontend service delivery with clear feedback loops.
- Enhance the local developer experience to be fast, consistent, and representative of production.
- Help improve CI/CD and observability for ML pipelines and models, integrating MLOps best practices.
Requirements
- Real-world experience running production systems and doing SRE, Platform, or DevOps work for web applications or APIs.
- Strong experience with Kubernetes in production environments, including cluster upgrades, workload deployments, scaling, and debugging.
- Experience with observability stacks such as Elasticsearch, Kibana, Prometheus, or Grafana, and leading efforts to improve logs, metrics, and dashboards.
- Deep experience with relational databases and SQL, including profiling slow queries, designing indexes, and optimizing query patterns.
- Comfortable in at least one backend language (e.g., Python) to read and modify application code for infrastructure and performance improvements.
- Experience improving CI/CD pipelines, including build/test speed, deployment workflows, and release strategies.
- Worked with infrastructure-as-code tools or similar patterns to manage environments.
Culture & Benefits
- Opportunity to work on high-impact projects in small, autonomous squads.
- Thoughtfully designed developer experience with fast CI and 1-click deploys.
- Fully remote roles with regular working hours and no-meeting Wednesdays.
- Flexible time off to recharge when needed.
- Collaborative and kind team environment for learning, growth, and doing your best work.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →