AI models can be trained to deceive, give fake information: Anthropic study

Artificial intelligence (AI) models can be trained to deceive and once a model exhibits deceptive behaviour, standard techniques could fail to remove such deception and create a false impression of safety, new research led by Google-backed AI startup Anthropic has found.

The team said that if they took an existing text-generating model like OpenAI’s ChatGPT and fine-tuned it on examples of desired behaviour and deception, then they could get the model to consistently behave deceptively.

Read more

You may also like

More in IT

Comments are closed.