Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts

Anthropic announced the development of a new system on Monday that can protect artificial intelligence (AI) models from jailbreaking attempts. Dubbed Constitutional Classifiers, it is a safeguarding technique that can detect when a jailbreaking attempt is made at the input level and prevent the AI from generating a harmful response as a result of it.

Subhashree

Feb 4, 2025 - 14:30

0 0

Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts

Anthropic announced the development of a new system on Monday that can protect artificial intelligence (AI) models from jailbreaking attempts. Dubbed Constitutional Classifiers, it is a safeguarding technique that can detect when a jailbreaking attempt is made at the input level and prevent the AI from generating a harmful response as a result of it.

Tags:

Apple Raises Concern Over First Porn App on iPhone Under EU Rules

What's Your Reaction?

Dislike

Love

Funny

Angry

Sad

Wow

Subhashree Hi, This is Subhi. Welcome to my blog! I love to keep up with the latest news in healthcare, technology and media. Here you will find insightful articles that inform and interest you about the world around you. Join me as I drift between health and technology, and stay up-to-date!

Anthropic Developing Constitutional Classifiers to Safeguard AI Models From Jailbreak Attempts

Tags:

What's Your Reaction?

Related Posts

Popular Posts

Recommended Posts

Popular Tags