Principal Machine Learning Engineer (AI)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Principal Machine Learning Engineer (AI): Turning research into production-grade ML systems for a proactive AI that understands context, plans actions, and carries work forward with an accent on end-to-end pipelines for data, training, evaluation, inference, and deployment. Focus on fine-tuning models with LoRA/QLoRA/SFT/DPO, architecting scalable inference systems, GPU optimization, and integrating into backend/mobile/desktop products.
Location: Remote (Singapore)
Company
's A1 team is building a proactive AI system that understands conversations, plans actions, and advances work over time.
What you will do
- Build and own end-to-end ML pipelines spanning data, training, evaluation, inference, and deployment.
- Fine-tune models using LoRA, QLoRA, SFT, DPO, and distillation.
- Architect scalable inference systems balancing latency, cost, and reliability.
- Design data systems for synthetic and real-world training data.
- Implement evaluation pipelines for performance, robustness, safety, and bias.
- Own production deployment with GPU optimization, memory efficiency, and scaling.
- Collaborate with application engineering to integrate ML into products.
Requirements
- Strong background in deep learning and transformer-based architectures.
- Hands-on experience training, fine-tuning, or deploying large-scale ML models in production.
- Proficiency with PyTorch or JAX, and ability to learn others quickly.
- Experience with distributed training/inference frameworks like DeepSpeed, FSDP, Megatron, ZeRO, Ray.
- Strong software engineering for robust, production-grade systems.
- Experience with GPU optimization including memory efficiency, quantization, mixed precision.
- Comfort owning ambiguous, zero-to-one ML systems end-to-end.
Nice to have
- Experience with LLM inference frameworks like vLLM, TensorRT-LLM, FasterTransformer.
- Contributions to open-source ML or systems libraries.
- Background in scientific computing, compilers, or GPU kernels.
- Experience with RLHF pipelines (PPO, DPO, ORPO).
- Training or deploying multimodal or diffusion models.
- Large-scale data processing with Apache Arrow, Spark, Ray.
Culture & Benefits
- Small, high-talent-density, hands-on team making collective decisions at rapid speed.
- Balance between shipping high-quality work and learning through iteration.
- Bring structure, exercise judgment, and execute independently.
Hiring process
- Applications evaluated by technical team; 3-4 interviews via virtual meetings and/or onsite.
- Value transparency and efficiency with prompt decisions.
- Offers extended to those demonstrating exceptional skills and mindset.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →