Staff Machine Learning Engineer, Genai Platform (AI)

253 300 - 354 600$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

lead

Английский

Страна

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Staff Machine Learning Engineer, GenAI Platform (AI): Architecting and scaling hirify.global's Generative AI and LLM platform capabilities with an accent on designing resilient, large-scale distributed systems. Focus on building self-serve LLM workflows and developing comprehensive evaluation & benchmarking infrastructure.

Location: Remote (United States)

Salary: $253,300 - $354,600 USD

Company

hirify.global is a community of communities built on shared interests, passion, and trust, and is home to the most open and authentic conversations on the internet.

What you will do

Drive GenAI Infrastructure Strategy: Propose, design, and lead the architecture of our next-generation LLM platform.
Design Resilient, Large-Scale Distributed Systems: Architect highly fault-tolerant training infrastructure capable of supporting multi-week, distributed workloads across massive GPU clusters.
Build Self-Serve LLM Workflows: Design and implement robust, production-grade pipelines for LLM fine-tuning.
Develop Comprehensive Evaluation & Benchmarking Infrastructure: Build scalable systems for automated regression detection, structured metrics tracking, and complex inference-heavy evaluation patterns.
Architect Advanced Data Ingestion Pipelines: Extend our distributed data platforms to natively and efficiently handle the massive, multimodal datasets required for modern GenAI workloads, optimizing for throughput and dynamic batching.
Provide Technical Leadership & Mentorship: Analyze complex bottlenecks in distributed systems to optimize for performance and cost-efficiency.

Requirements

10+ years of work experience in a production software development environment or building complex distributed data systems, plus a degree in ML, Engineering, Computer Science, or a related discipline.
GenAI/LLM Infrastructure Expertise: Proven track record of designing and operating large-scale ML systems, specifically working with distributed training frameworks and LLM serving/inference optimization.
Distributed Systems Mastery: Hands-on experience managing fault-tolerant, petabyte-scale distributed systems and multi-node/multi-GPU training clusters.
Advanced MLOps Knowledge: Deep understanding of modern ML orchestration, fine-tuning pipelines, and model evaluation methodologies. Experience with tools like Ray, MLflow, or similar ecosystem standards.
GPU Experience: Hands-on practice with CUDA environments, GPU virtualization/containerization, and doing it all within Kubernetes.
Production Engineering Fundamentals: Hands-on experience with Kubernetes, Docker, and building production-quality code in Python and/or Go.

Culture & Benefits

Comprehensive Healthcare Benefits and Income Replacement Programs
401k with Employer Match
Global Benefit programs that fit your lifestyle, from workspace to professional development to caregiving support
Flexible Vacation & Paid Volunteer Time Off
Generous Paid Parental Leave

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →

Staff Machine Learning Engineer, Genai Platform (AI)

Мэтч & Сопровод

Описание вакансии

TL;DR

Company

What you will do

Requirements

Culture & Benefits

Похожие вакансии

Senior Machine Learning Engineer (AI)

Staff Software Engineer, Applied AI (Forward Deployed) (AI)

Agentic AI Engineer

Principal Machine Learning Engineer (AI)

Backend Engineer (AI)

Staff ML Research Engineer (AI)