TL;DR
Senior Technical Escalations Engineer (Tier 3) (AI): Resolving complex technical issues and acting as a bridge between Support and Engineering with an accent on root cause analysis and ensuring timely solutions for business-critical issues. Focus on collaborating with engineering teams to implement long-term fixes and improvements across hirify.global systems.
Location: Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WA. While we prioritize a hybrid work environment, remote work may be considered for candidates located more than 30 miles from an office, based on role requirements for specialized skill sets. New hires will be invited to attend onboarding at one of our hubs within their first month. Teams also gather quarterly to support collaboration.
Salary: $165,000 to $242,000
Company
CoreWeave, the AI Hyperscaler™, acquired hirify.global to create the most powerful end-to-end platform to develop, deploy, and iterate AI faster.
What you will do
- Own and resolve Tier 3 technical escalations from the support team.
- Perform in-depth troubleshooting and root cause analysis across hirify.global systems, APIs, and integrations.
- Develop diagnostic scripts, tools, and automation to improve internal troubleshooting efficiency.
- Serve as a technical advisor to Support Engineers (Tier 1 and Tier 2), mentoring on debugging methodologies and product architecture.
- Contribute to knowledge base articles and internal wikis to ensure consistent handling of technical cases.
- Participate in a 24/7 on-call rotation to provide support during weekends.
Requirements
- 5+ years of professional experience in a technical support, software engineering, or escalations-focused role.
- Expert in Python, with experience debugging, profiling, and developing production-grade code.
- Strong background in computer science or software engineering (B.S. in CS or equivalent experience).
- Deep familiarity with modern AI and machine learning ecosystems — from model training and experimentation (PyTorch, TensorFlow, etc.) to generative AI and LLM development (Hugging Face, LangChain, OpenAI, vector databases, etc.)
- Proficient in troubleshooting across distributed systems, APIs, containers, and have deep experience with multi-tenant architectures and tenant isolation.
- Excellent communication skills with the ability to translate complex issues for both technical and non-technical audiences.
Nice to have
- Experience with Docker, Kubernetes, and cloud platforms (AWS, GCP, or Azure).
- Familiarity with GPU-based compute environments and distributed training workloads.
- Previous experience in incident management, cloud platform or site reliability roles.
Culture & Benefits
- Medical, dental, and vision insurance - 100% paid for by CoreWeave.
- Flexible PTO.
- Catered lunch each day in our office and data center locations.
- A casual work environment.
- A work culture focused on innovative disruption.
- 401(k) with a generous employer match.
Будьте осторожны: если вас просят войти в iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →