AI models can be trained to deceive, give fake information: Anthropic study
By
Biju Kumar
Artificial intelligence (AI) models can be trained to deceive and once a model exhibits deceptive behaviour, standard techniques could fail to remove such deception and create a false impression of safety, new research led by Google-backed AI startup Anthropic has found.
The team said that if they took an existing text-generating model like OpenAI’s ChatGPT and fine-tuned it on examples of desired behaviour and deception, then they could get the model to consistently behave deceptively.