Copyright © 2018 DataScience.US All Rights Reserved.
The Problem of Biased Algorithms and How to Prevent Them
When we talk about discriminatory algorithms, we usually mean it in the sense of “able to tell the difference between two different types of items.”
However, algorithms can be discriminatory in a second, and more troubling sense. If data scientists are not careful in how they constructed their algorithms, their algorithms can actually be discriminatory in the sense of being biased towards certain races, ages, and sexes.
How can an algorithm be biased and discriminatory? Are there any solutions to this conundrum?
Examples of Bias in Algorithms
Very rarely is it the intent of an algorithm’s creator to create an algorithm that is biased or discriminatory. In fact, algorithms like the kind the used to determine the likelihood of paying back a loan from a bank are often intended to be as non-discriminatory as possible. A machine learning algorithm is not usually trained on variables like race, sex, or similar variables that could be used to treat someone in a biased way. Instead it could be trained on things like the words used in loan applications.
A recent study done by three economists found that if a person used one of five words or phrases (God, promise, will pay, thank you, hospital), they were much less likely to pay back their loans. A bank might use this data to try and minimize the amount of defaulting their borrowers do. However, they would be unintentionally discriminating against someone who needs a loan to pay a loved one’s hospital bill and would genuinely pay back the loan afterwards, simply because on average people who claim they need help to cover medical bills are lying about it.
Another example might be training an algorithm for use in approving mortgages on geographical information that relates to the quality of a neighborhood a person lives in. An algorithm might track variables such as the quality of schools and other houses in the area. However, this creates its own issues. Over time populations can become less diverse and communities can become segregated, with certain minorities disproportionately living in low income areas. This means that one could unintentionally end up targeting certain ethnicities within a city by tracking variables which are correlated with features like race, sex, or age, even though the algorithm was not explicitly designed to look for these categories.
The above examples illustrate how while algorithms can be discriminatory even when their designers try to correct for the possibility of bias and discrimination. What can be done to help ensure that algorithms don’t accidentally discriminate against groups of people?
Preventing Bias in Algorithms
Data scientists are tackling the issue of biased and discriminatory algorithms head on by designing tests that can determine if an algorithm may be introducing bias into its decision-making.
Moritz Hardt, a research scientist at Google, recognized the need for “a vetted methodology in machine learning for preventing (this kind of) discrimination based on sensitive attributes.” Hardt’s research recognized that naïve approaches like defining a set of “sensitive attributes” that should be removed from data would fail because though a particular attribute may no longer be present in the data, other attributes combined could act as a proxy and have the same effect. Instead, Hardt and his team tried an “equal opportunity” approach to supervised learning, based upon the idea that “individuals who qualify for a desired outcome should have an equal chance of being classified for this outcome.” The framework the Google research team created is intended to help identify variables which may lead to bias, and to keep a healthy balance between non-discrimination and classification accuracy.
Increased transparency has also been proposed as a solution to algorithmic bias. A paper published by researchers at Carnegie Mellon University explored a new way of determining how much impact a certain piece of input data has on an algorithm. This system could be a tool for government bodies or corporations to determine if an algorithm is being discriminatory. The system, which has been dubbed “Quantitative Input Influence” essentially functions by testing an algorithm on wide range of inputs, and then estimating which one of the inputs or set of inputs had the largest effect on the output of the algorithm. It tries to determine the weight or amount of influence an input has on a system’s output, creating “Algorithmic Transparency”. For example, it could tell you what percentage of a credit score is determined based off of a specific variable, like outstanding bills.
The ability of tools to determine if an algorithm is biased, which allows for correction, will likely be of great importance in the coming years. There is still a sizeable legal difference between explicit discrimination and implicit discrimination, and the question of who is held responsible when a computer algorithm discriminates has no real precedence.
While techniques to combat biased algorithms are being investigated, careful attention should be paid to how algorithms and machine learning techniques are applied. The blackbox nature of machine learning makes scrutinizing inputs and outputs for unintended consequences of importance to the health and fairness of our society.