TL;DR
Member of Technical Staff - Pretraining Text Data (AI): Building and optimizing high-quality datasets for training and evaluating large language models with an accent on data collection strategies, quality improvement, and ethical alignment. Focus on developing scalable data pipelines, analyzing real-world text datasets, and understanding data-driven model behaviors.
Location: New York, United States. Employees are expected to work from a designated Microsoft office at least four days a week if they live within 50 miles of that location.
Salary: USD $188,000 – $304,200 per year (for IC5 grade in New York City metropolitan area).
Company
hirify.global’s Superintelligence Team is a startup-like team within hirify.global, dedicated to advancing AI towards Humanist Superintelligence by creating ultra-capable, controllable, safety-aligned, and human-value-anchored systems.
What you will do
- Create high-quality datasets for training and evaluation, and run experiments to assess data impact.
- Develop and maintain scalable data pipelines for text data ingestion, preprocessing, filtering, and annotation.
- Analyze real-world text datasets to assess quality, diversity, and relevance, identifying areas for improvement.
- Build lightweight tools and workflows for dataset auditing, visualization, and versioning.
- Collaborate with Safety, Ethics, and Governance teams to ensure datasets meet standards for responsible AI practices.
Requirements
- Bachelor’s Degree in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or a related technical discipline.
- 4+ years of technical engineering experience with coding in Python and common data libraries (Pandas, NumPy, etc.).
- 2+ years of experience in data analysis or data engineering, including work with large-scale unstructured or semi-structured datasets.
- Proficiency in statistics and exploratory data analysis methods.
Nice to have
- Master’s Degree in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or a related technical discipline.
- Familiarity with data processing frameworks such as Spark, Ray, or Apache Beam.
- Ability to communicate technical findings clearly to research and product teams.
Culture & Benefits
- Work with a growth mindset, innovate to empower others, and collaborate to achieve shared goals.
- Culture of inclusion built on values of respect, integrity, and accountability.
- Opportunity to work on next-generation AI models with a high-impact, cross-disciplinary team.
- Partnership with incredible product teams to reach billions of users.
- Access to additional benefits and compensation information via Microsoft careers.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →