Reliability Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Reliability Engineer (AI/Containers): Managing and optimizing the reliability and performance of on-premise digital pathology and AI installations at customer sites with an accent on container orchestration and system stability. Focus on deep root cause analysis across infrastructure layers, optimizing AI workloads for large datasets, and integrating agentic workflows into operational processes.
Location: On-site in Philadelphia, Pennsylvania
Company
Leader in pathology AI software empowering the transition from traditional microscopes to digital, AI-driven precision medicine.
What you will do
- Deploy, configure, and support container-based application stacks in on-premise customer environments.
- Own system reliability across installations, including uptime, performance, backup/recovery, and upgrade workflows.
- Perform deep root cause analysis across application, container, host, storage, and networking layers using AI tools.
- Optimize system performance for large image datasets and compute-heavy AI workloads.
- Improve installation automation and configuration management by integrating agentic workflows.
- Develop monitoring, logging, and alerting patterns tailored for customer-hosted deployments.
Requirements
- Deep hands-on experience operating containerized applications and orchestration in production.
- Strong Linux systems expertise in process management, networking, storage, and performance tuning.
- Expert troubleshooting skills for distributed systems across infrastructure layers.
- Experience with enterprise networking and operating software in customer-managed on-premise environments.
- Working knowledge of observability practices (logs, metrics, tracing) in non-cloud-native settings.
- Demonstrated fluency in applying AI tools, LLMs, or agentic pipelines to real operational problems.
Nice to have
- Experience with healthcare or regulated environments.
- Exposure to Kubernetes for hybrid deployments.
- Experience with infrastructure automation or configuration management tools.
- Familiarity with GPU-enabled workloads and database performance tuning for large datasets.
Culture & Benefits
- Creative and agile office environment located in the heart of Philadelphia.
- Competitive pay with comprehensive savings, schedule, and insurance options.
- Culture based on ownership, speed, simplification, and challenging the status quo.
- Equal opportunity workplace that celebrates diversity and inclusion.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →