Language models might be able to self-correct biases—if you ask them

The second test used a data set designed to check how likely a model is to assume the gender of someone in a particular profession, and the third tested for how much race affected the chances of a would-be applicant’s acceptance to a law school if a language model was asked to do the selection—something that, thankfully, doesn’t happen in the real world.

The team found that just prompting a model to make sure its answers didn’t rely on stereotyping had a dramatically positive effect on its output, particularly in those that had completed enough rounds of RLHF and had more than 22 billion parameters, the variables in an AI system that get tweaked during training. (The more parameters, the bigger the model. GPT-3 has around 175 million parameters.) In some cases, the model even started to engage in positive discrimination in its output.

Crucially, as with much deep-learning work, the researchers don’t really know exactly why the models are able to do this, although they have some hunches. “As the models get larger, they also have larger training data sets, and in those data sets there are lots of examples of biased or stereotypical behavior,” says Ganguli. “That bias increases with model size.”

But at the same time, somewhere in the training data there must also be some examples of people pushing back against this biased behavior—perhaps in response to unpleasant posts on sites like Reddit or Twitter, for example. Wherever that weaker signal originates, the human feedback helps the model boost it when prompted for an unbiased response, says Askell.

The work raises the obvious question whether this “self-correction” could and should be baked into language models from the start.

Source link

Language models might be able to self-correct biases—if you ask them

Byadmin

By admin

Related Post

This international surveillance project aims to protect wheat from deadly diseases

More puzzles, less sleep | MIT Technology Review

The cult of tech | MIT Technology Review

You missed

Milder trends this weekend; Accumulating snow possible next week

Becoming a resident of South Dakota is easy. Some say too easy

United States Hog Inventory Up 1 Percent

Here We Go: Public Health Experts Warn of Looming ‘Quad-demic’ This Winter — Flu, COVID, RSV, and Norovirus | The Gateway Pundit

Consumers Advisory