TL;DR
DevOps Engineer (AI): Building and optimizing state-of-the-art machine learning system infrastructure (cloud and on-premise) and architecting platforms to create, train, and deploy ML models with an accent on leveraging open-source and cloud computing technologies to build effective solutions. Focus on ensuring high availability, scalability, and reliability of complex services within a Kubernetes environment using infrastructure as code, observability, monitoring, logging, and alerting tools.
Location: Hybrid in Noida, India
Company
hirify.global is a Series C Enterprise SaaS startup headquartered in Mountain View, California, that builds AI tools to augment humans by revolutionizing customer engagement and transforming contact centers into strategic assets using Large Language Models to extract deep insights.
What you will do
- Design, build, and enhance state-of-the-art machine learning system infrastructure (cloud and on-premise) and architect platforms for ML model creation, training, and deployment.
- Build operating dashboards and charts to track system errors and performance, enabling root cause analysis.
- Identify gaps and evaluate relevant tools and technologies to improve processes and systems, leveraging open-source and cloud computing.
- Collaborate with the AI team to drive ML projects from conception to completion and production monitoring.
Requirements
- Bachelor's degree or above with a good academic background.
- 2-4 years of meaningful work experience in DevOps handling complex services.
- Strong troubleshooting skills to keep services highly available.
- Strong expertise with Google Cloud Platform (GCP), Docker, Kubernetes, CI/CD, and Jenkins.
- Extensive experience in designing, implementing, and maintaining infrastructure as code, preferably using Terraform.
- Create and maintain deployment manifest files for microservices using HELM.
- Strong expertise is required with deployment at scale on a Kubernetes cluster via HPA.
- Broad technical background and experience with architecture, design, and operations of cloud solutions and meeting security compliance requirements.
- Design, implement, and maintain observability, monitoring, logging, and alerting using tools like Prometheus, Grafana, Promtail, Loki, and Datadog.
Nice to have
- LLMOps or MLOps experience.
Culture & Benefits
- Market-leading compensation based on skills and aptitude.
- Work with an AI-native platform that leverages advanced technologies like Large Language Models.
- Opportunity to transform contact centers into strategic assets.
- Consistently updated with the latest AI innovations.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →