Beware of Bias in Big Data, Feds Warn
A new report issued this week by the Federal Trade Commission applauds big data practitioners for helping to increase access to goods, services, healthcare, education, and employment. But the FTC also warned businesses and organizations to be aware of “hidden biases” that may creep into their calculations.
In “Big Data: A Tool for Inclusion or Exclusion,” a 50-page report you can download here, the FTC looks at both sides of the big data aisle. On the plus side, big data can benefit people of all stripes, according to the FTC.
The regulator cited the wide availability of personalized goods and services and targeted marketing as beneficial to consumers. Other ways that big data is helping the wider population includes spotting students at risk of dropping out of school, increasing equal access to employment, the introduction of personalized medicine, and providing access to credit using non-traditional methods.
But big data has a darker side, the FTC warned. Some of the negatives of big data include the potential to exclude opportunity through bias and inaccuracy, contributing to disparities in opportunity, making goods more expensive in low-income communities, increasing the potential for fraud, and raising the risk of data breaches.
The group warns that the introduction of hidden bias has the potential to wreak havoc on big data projects. “If the process that generated the underlying data reflects biases in favor of or against certain types of individuals, then some statistical relationships revealed by that data could perpetuate those biases,” the FTC says in the report.
The problems caused by hidden biases are nothing new to statisticians, who have been trained to correct for them for over a century. But because of the nature of big data analyses – the size of the data involved and the way that data scientists try to extract truth from them — bias is becoming a bigger factor compared to other potential problems, such as statistical error, says David van Dyk, a professor at the Imperial College London and a member of the American Statistical Association.
“The way we think about doing data analytics [is changing] because of the technology we have amiable to us and the data,” van Dyk tells Datanami. “It used to be you worry about statistical error, the error in small sample sizes. That’s not completely irrelevant, but now the problem is more that you have so much data, it’s more likely bias, because your sample is not representative of the truth. You can have a misrepresentation of the truth in the massive data you have.”
In its report, the FTC cites a 2013 HBR article, “The Hidden Biases in Big Data,” by Kate Crawford, a Microsoft (NASDAQ: MSFT) researcher. In her article, Crawford demonstrates how the city of Boston unknowingly introduced bias with a smartphone app that allows citizens to automatically report potholes.
“While certainly a clever approach, StreetBump has a signal problem,” Crawford writes. “People in lower income groups in the US are less likely to have smartphones, and this is particularly true of older residents, where smartphone penetration can be as low as 16 percent.” So while Boston’s leaders thought they were helping to employer residents in a democratic manner, in fact they were reducing the odds that potholes get patched in poorer and older sections of the city.
“While having the ability to use more data can increase the power of the analysis, simply adding more data does not necessarily correct inaccuracies or remove biases,” the FTC says in its report. “In addition, the complexity of the data and statistical models can make it difficult for analysts to fully understand and explain the underlying model or its results. Even when data analysts are very careful, the results of their analysis may affect particular sets of individuals differently because their models may use variables that turn out to operate no differently than proxies for protected classes.”
We may think that having a bigger data set increases the odds of having a clearer picture of reality. Unfortunately, that may not be true. “Data and data sets are not objective; they are creations of human design,” Microsoft’s Crawford writes. “We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.”
Depending on the project, companies may need to exercise extra regarding their use of big data. For example, consider a case where big data shows that 30 percent of consumers who buy diapers will respond to an ad for baby formula. In this case, there’s no downside to the other 70 percent, who can simply disregard the ad.
“On the other hand,” the FTC writes, “if big data analytics are used as the basis for access to credit, housing, or other similar benefits, the potential effects on consumers from inaccuracies could be substantial.”
For example, if big data shows that people who don’t participate in social media are 30 percent more likely to be identity thieves, a wireless phone company may be inclined to flag those people as “risky.” If the company then requires the flagged individuals to submit additional documentation before they can get contract, and those people fail to get the documentation and aren’t told why they were denied, that could be a violation of federal law.
In many situations, it’s illegal for companies and other institutions to discriminate against people, whether it’s done knowingly or unknowingly. As a government regulator, it’s the FTC’s job to enforce discrimination laws. And it does enforce them. In its report, the FTC discusses various enforcement actions it has taken against companies like Time Warner Cable, Instant Checkmate, Spokeo, Sequoia One, ChoicePoint, and CompuCredit.
Among the laws that big data practitioners can run afoul of are: the Fair Credit Reporting Act, which requires credit reporting agencies to take “reasonable procedures to ensure maximum possible accuracy” in their credit reports. Other big data pitfalls exist in the form of the Equal Credit Opportunity Act, Americans with Disabilities Act, the Age Discrimination in Employment Act, the Fair Housing Act, the Genetic Information Nondiscrimination Act, and the Federal Trade Commission Act. Companies aiming to keep their analytics projects above the ethical water line would do well to keep these laws to keep in mind.
“Given that big data analytics can have big consequences,” the FTC concludes, “it is imperative that we work together—government, academics, consumer advocates, and industry—to help ensure that we maximize big data’s capacity for good while identifying and minimizing the risks it presents.”
This article originally ran in our sister publication, Datanami.