Progress in deep learning is exploding. The last three years have seen improvements in technology that have allowed researchers to do things that people could have only dreamed of just a decade prior. We’ve seen fully computer-generated faces,
AI songs, and cutting-edge language algorithms that produce human-indistinguishable work. But our growing reliance on these tools also means a growing reliance on potentially biased
data, and this leads to an important question: how do we avoid, or minimize, bias in machine learning applications?First and foremost,
AI is a tool. The unfortunate reality is that it is up to us as humans to decide how that tool should be used. To fully appreciate the scope of the problem, we need to consider the
impact of bias in
data, in algorithms, and ultimately how these affect the decisions made by machines.A recent example of bias in machine learning
data comes from Microsoft, which was forced to apologize after its team building an
AI chat bot named Tay was “taught” some pretty racist ideas by
Twitter users over a short period of time. Tay was designed to learn about conversation from human interlocutors on
Twitter. Unfortunately, it learned from some pretty unsavory people on
Twitter instead! The researchers at Microsoft had taught Tay with a
data set of anonymized Tweets, called a “corpus,” and they hadn’t filtered it for offensive
content. They didn’t have a specific goal in mind for their experiments with Tay, but they wanted it to learn conversational skills so they gave it access to
Twitter so that it could learn from conversations among human users. By contrast, Google’s DeepMind created AlphaGo by
studying thousands of games played by human experts first and then adjusting its algorithms accordingly rather than starting with generic knowledge about the game and then seeking out specific examples of play (see the paper).In addition to flawed
data, sometimes it is the algorithm itself that is biased in some particular way. This is particularly important in situations where there is a procedural aspect to the generation of information – i.e, algorithms that aren’t completely deep-learning based. In these situations, human bias can lead to procedural decisions that consistently make the wrong move. For example, recommendation algorithms (like those used at Facebook, Instagram, and Netflix) typically try and provide products or users with access to similar products or users. They use what are called ‘predicted group traits’; thousands of subsets of the population are automatically generated, and individuals are placed into one or more of these groups. These groups then recieve different product or user recommendations, and much of the time, these recommendations result in broad ethnic, gender, socio-economic, or racial stereotypes – like when black Facebook account owners were over 50% more likely to have their accounts automatically disabled. It’s important for companies building
artificial intelligence systems not just to create them, but also to monitor how they function over time, and intervene when clear examples of preferential or biased
data manifest.
How to Avoid Bias in Machine Learning
1. Investigate quicklyFirstly, anomalous behavior should be investigated as soon as possible – mostly due to the fact that, even if a system isn’t performing well enough yet for commercial use, researchers can later use these systems as test cases for investigating bias-related issues in future iterations of the algorithm. This will certainly slow algorithmic deployment, but on the flip side, it also leads to higher quality products with better market value and a longer lifespan.2. Maintain
Data TransparencyThe second way that we can prevent machine learning systems from becoming biased is through transparency about what has been learned. This means clear, accountable documentation, open-sourcing when appropriate, and the ability to distinguish between what has been learned through supervised learning (carefully hand-picked samples) versus unsupervised learning (i.e., mass training on unaltered datasets). A side note: a few years ago, Google began publishing
research about this distinction in its Deep Mind blog after it discovered that its systems were developing some troubling biases simply through exposure to an incredible amount of unsupervised
content on
YouTube! With over 500 hours of
video per minute, you’re bound to get a few bad apples.3. Improve
research responsibilityThe third way we can avoid bias involves increasing
social responsibility on the part of
artificial intelligence researchers and engineers who are building these systems. It’s not just enough for the businesses and consumers that later use these products to understand the nuance – to improve the quality of future models, such ethical responsibility must first begin with the
research. Additionally, it is necessary that we make an effort to not only build unbiased
AI, but to also continuously monitor and make sure that
AI systems are used appropriately once they are deployed commercially or otherwise adopted into existing environments.That’s a wrap! We sincerely hope you enjoyed our guide on how to avoid bias in machine learning.
What are you waiting for? Give CrankWheel a test-run and see for yourself just how easy it is to instantly share your screen with distracted or less tech-savvy prospects. We think you and your sales team will be so blown away by how easy and fast Crank Wheel is, that you’ll want to start on boarding it for the entire team.
Speaking of deep learning, you might be interested in Deep learning on Wikipedia. If you’re curious about how biases form in AI, check out Algorithmic bias on Wikipedia. For more insight into the technologies mentioned, you might find the article on Computer-generated imagery on Wikipedia fascinating, or explore the intricacies of Natural language processing on Wikipedia