Kyle Wiggers / TechCrunch:

Anthropic researchers: AI models can be trained to deceive and the most commonly used AI safety techniques had little to no effect on the deceptive behaviors  —  Most humans learn the skill of deceiving other humans.  So can AI models learn the same?  Yes, the answer seems — and terrifyingly, they’re exceptionally good at it.





Source link

By admin

Malcare WordPress Security