Researchers find that models trained using common data-collection techniques judge rule violations more harshly than humans would.
Machine-learning models often make harsher judgments than humans due to being trained on the wrong type of data, which can have serious real-world implications, according to researchers from
In an effort to improve fairness or reduce backlogs, machine-learning models are sometimes designed to mimic human decision-making, such as deciding whether social media posts violate toxic content policies.
But researchers from MIT and elsewhere have found that these models often do not replicate human decisions about rule violations. If models are not trained with the right data, they are likely to make different, often harsher judgments than humans would.
In this case, the “right” data are those that have been labeled by humans who were explicitly asked whether items defy a certain rule. Training involves showing a machine-learning model millions of examples of this “normative data” so it can learn a task.
But data used to train machine-learning models are typically labeled descriptively — meaning humans are asked to identify factual features, such as, say, the presence of fried food in a photo. If “descriptive data” are used to train models that judge rule violations, such as whether a meal violates a school policy that prohibits fried food, the models tend to over-predict rule violations.
This drop in
Ghassemi is senior author of a new paper detailing these findings, which was published on May 10 in the journal
In each case, the descriptive labelers were asked to indicate whether three factual features were present in the image or text, such as whether the dog appears aggressive. Their responses were then used to craft judgments. (If a user said a photo contained an aggressive dog, then the policy was violated.) The labelers did not know the pet policy. On the other hand, normative labelers were given the policy prohibiting aggressive dogs, and then asked whether it had been violated by each image, and why.
The researchers found that humans were significantly more likely to label an object as a violation in the descriptive setting. The disparity, which they computed using the absolute difference in labels on average, ranged from 8 percent on a dataset of images used to judge dress code violations to 20 percent for the dog images.
“While we didn’t explicitly test why this happens, one hypothesis is that maybe how people think about rule violations is different from how they think about descriptive data. Generally, normative decisions are more lenient,” Balagopalan says.
Yet data are usually gathered with descriptive labels to train a model for a particular machine-learning task. These data are often repurposed later to train different models that perform normative judgments, like rule violations.
Training troubles
To study the potential impacts of repurposing descriptive data, the researchers trained two models to judge rule violations using one of their four data settings. They trained one model using descriptive data and the other using normative data, and then compared their performance.
They found that if descriptive data are used to train a model, it will underperform a model trained to perform the same judgments using normative data. Specifically, the descriptive model is more likely to misclassify inputs by falsely predicting a rule violation. And the descriptive model’s accuracy was even lower when classifying objects that human labelers disagreed about.
“This shows that the data do really matter. It is important to match the training context to the deployment context if you are training models to detect if a rule has been violated,” Balagopalan says.
It can be very difficult for users to determine how data have been gathered; this information can be buried in the appendix of a research paper or not revealed by a private company, Ghassemi says.
Improving dataset transparency is one way this problem could be mitigated. If researchers know how data were gathered, then they know how those data should be used. Another possible strategy is to fine-tune a descriptively trained model on a small amount of normative data. This idea, known as transfer learning, is something the researchers want to explore in future work.
They also want to conduct a similar study with expert labelers, like doctors or lawyers, to see if it leads to the same label disparity.
“The way to fix this is to transparently acknowledge that if we want to reproduce human judgment, we must only use data that were collected in that setting. Otherwise, we are going to end up with systems that are going to have extremely harsh moderations, much harsher than what humans would do. Humans would see nuance or make another distinction, whereas these models don’t,” Ghassemi says.
Reference: “Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data” by Aparna Balagopalan, David Madras, David H. Yang, Dylan Hadfield-Menell, Gillian K. Hadfield and Marzyeh Ghassemi, 10 May 2023, Science Advances.
DOI: 10.1126/sciadv.abq0701
This research was funded, in part, by the Schwartz Reisman Institute for Technology and Society, Microsoft Research, the Vector Institute, and a Canada Research Council Chain.