Model Policy (AI Safety)
Мэтч & Сопровод
Для мэтча с этой вакансией нужен Plus
Описание вакансии
TL;DR
Model Policy (AI Safety): Define and maintain policies for frontier AI models in high-risk domains like agentic systems, multimodal systems, and user safety with an accent on behavioral specifications, evaluation criteria, and safeguards. Focus on translating risks into measurable model behaviors, iterating via red-teaming and deployment data, and operationalizing policies across training, evaluation, and deployment.
Location: Hybrid in San Francisco office (3 days/week in office), relocation support offered.
Salary: $207K – $295K + equity
Company
AI research and deployment company building safe AGI to benefit humanity.
What you will do
- Design and maintain model policies for safety domains including dual-use, agentic, and emerging risks.
- Translate risk models into behavioral specs, evaluation criteria, grading guidance, and safeguards.
- Define boundaries between beneficial AI uses and harmful assistance.
- Build policy artifacts for model training, evaluation, and deployment.
- Partner with research, engineering, product, and operations teams to operationalize policies.
- Use red-teaming, deployment data, and failure modes to iterate policies.
- Identify emerging risks and study real-world model behavior.
- Contribute to safety reports, system cards, and external communications.
- Design and run human data campaigns for policy measurement and improvement.
Requirements
- Must be based in San Francisco (hybrid model).
- Strong judgment on AI risks in ambiguous, high-impact areas.
- Experience with policies, taxonomies, harm/threat models, or risk frameworks for complex systems.
- Ability to structure fuzzy questions into policy frameworks and evaluation criteria.
- Comfort with empirical evidence from evals, red-teaming, and deployments.
- Technical judgment on trainable/measurable model behaviors at scale.
- Cross-functional collaboration with research, engineering, product, and policy teams.
Culture & Benefits
- Hybrid workplace with 3 office days/week, optional remote Thursdays/Fridays.
- Relocation support for new employees.
- Open-plan offices with adjustable desks, meals, snacks, nap rooms, bike storage.
- Fast-paced, collaborative environment with shifting priorities based on models and risks.
- Equal opportunity employer with accommodations for disabilities.
- Background checks per US laws including San Francisco Fair Chance Ordinance.
Будьте осторожны: если работодатель просит войти в их систему, используя iCloud/Google, прислать код/пароль, запустить код/ПО, не делайте этого - это мошенники. Обязательно жмите "Пожаловаться" или пишите в поддержку. Подробнее в гайде →