Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
Текст:
TL;DR
AI Research Engineer (AI): Building and operating multimodal data pipelines, training and evaluation infrastructure, and internal tooling with an accent on reliability, performance, and cost. Focus on integrating capabilities into Datadog's products and hardening prototypes into reliable services.
Location: New York, New York, USA
Salary: $140,000 — $400,000 USD
Company
Datadog is a global SaaS business delivering cloud monitoring and security solutions.
What you will do
- Build and operate multimodal data pipelines, training and evaluation infrastructure, benchmarks, and internal tooling.
- Implement models, run experiments at scale, and profile for reliability, performance, and cost.
- Build simulation environments and replay infrastructure for agent training and evaluation.
- Orchestrate distributed training and distributed RL with Ray, including scheduling, scaling, and failure recovery.
- Establish rigorous automated benchmarks and regression tests for world model predictions, agent performance, and simulation fidelity.
- Collaborate with Research Scientists, Product, and Engineering to integrate capabilities into Datadog's products and to harden prototypes into reliable services.
Requirements
- Depth in distributed computing, RL Infra, and ML systems for training and inference at scale; experience with Ray, Slurm, or similar frameworks is a plus.
- Proficient in Python, familiar with a systems language (e.g., Rust, C++, or Go), and comfortable with modern cloud and data infrastructure.
- Practical experience implementing and operating ML training and inference systems (e.g., PyTorch or JAX), including containerization, orchestration, and GPU acceleration.
- Practical experience with large-scale model training and fine-tuning, including frameworks like Megatron-LM, DeepSpeed, SkyRL, VeRL, or TorchTitan, and techniques such as SFT, RLVR, RLHF, and efficient inference (quantization, speculative decoding).
- Can explain design and performance trade-offs clearly to both technical and non-technical audiences.
- Experience supporting or contributing to research publications.
Nice to have
- Strong software engineering skills with experience in domains such as observability, SRE, or security.
- Experience bridging research prototypes and real-world product applications, especially with large foundation models, world models, or RL-trained agents.
- A passion for pushing the boundaries of AI with a focus on customer impact and scalable deployment.
- Hands-on experience with GPU programming and optimization, including CUDA.
- Experience writing production data pipelines and applications.
- Experience building simulation or sandbox environments for agent training.
Culture & Benefits
- Competitive global benefits.
- New hire stock equity (RSUs) and employee stock purchase plan (ESPP).
- Opportunity to collaborate closely with colleagues across the Datadog offices in New York City and Paris.
- Opportunity to attend and present at conferences and meetups.
- Intra-departmental mentor and buddy program for in-house networking.
- An inclusive company culture, ability to join our Community Guilds (Datadog employee resource groups).
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →
Похожие вакансии
Anthropic
5 дней назад
Research Engineer (AI)
340 000 - 425 000$
Snowflake
3 дня назад
Post-Doctoral Researcher (AI)
160 000 - 220 000$
Anthropic
5 дней назад
Research Engineer, Machine Learning (AI)
280 000 - 425 000$
Reddit
2 дня назад
Senior Research Engineer (AI)
216 700 - 303 400$
Anthropic
5 дней назад
Performance Engineer (AI)
315 000 - 560 000$
Anthropic
5 дней назад
Research Engineer Scientist (AI)
315 000 - 340 000$