2 дня назад

Director, AI Alignment and Interpretability (Cybersecurity)

195 000 - 290 000$

Формат работы

remote (только USA)

Тип работы

fulltime

Грейд

director

Английский

Страна

Вакансия из Hirify RU Global, списка компаний с восточно-европейскими корнями
Для мэтча и отклика нужен Plus

Мэтч & Сопровод

Для мэтча с этой вакансией нужен Plus

Описание вакансии

Текст:

TL;DR

Director, AI Alignment and Interpretability (Cybersecurity): Leading alignment and interpretability research for security-domain AI systems with an accent on mechanistic interpretability and offensive security concepts. Focus on building methods to explain model behavior, detecting misuse signals, and developing alignment methodologies to ensure models operate within intended bounds.

Location: Remote (USA)

Salary: $195,000 - $290,000 per year

Company

A global leader in cybersecurity providing an AI-native platform designed to stop breaches across all industries.

What you will do

Own the alignment and interpretability research agenda for security-domain AI and lead the hardest open problems.
Build and apply techniques for detecting offensive-misuse signals in model internals using probing and circuit analysis.
Develop alignment methodologies and measurable evaluation frameworks, including behavioral constraints and deployment guardrails.
Contribute original research through publications and external engagement in the field of AI safety.
Recruit and lead a lean team of research scientists while remaining an active technical contributor.

Requirements

MS or PhD in machine learning, computer science, or a related field.
8+ years in ML research or engineering with direct experience in interpretability or alignment on LLMs.
Hands-on expertise with mechanistic interpretability methods like probing classifiers, circuit analysis, and activation patching.
Experience designing alignment evaluations, including behavioral testing and red-lining methodologies.
Proven track record of leading and growing researchers.
Must be based in the USA.

Nice to have

Background in offensive security, vulnerability research, or adversarial ML.
Published research in mechanistic interpretability, AI alignment, or AI safety.
Experience applying interpretability methods to domain-specialized or fine-tuned models.
Familiarity with alignment challenges specific to models with dual-use capabilities.

Culture & Benefits

Competitive compensation and equity awards.
Comprehensive physical and mental wellness programs.
Paid parental and adoption leaves.
Professional development opportunities for all employees.
Culture of flexibility and autonomy to own your career.

Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →