(Un)Objective Machines: A Look at Historical Bias in Machine Learning | by Gretel Tan

[ad_1]

A deep dive into biases in machine learning, with a focus on historical (or social) biases.

Humans are biased. To anyone who has had to deal with bigoted individuals, unfair bosses, or oppressive systems — in other words, all of us — this is no surprise. We should thus welcome machine learning models which can help us to make more objective decisions, especially in crucial fields like healthcare, policing, or employment, where prejudiced humans can make life-changing judgements which severely affect the lives of others… right? Well, no. Although we might be forgiven for thinking that machine learning models are objective and rational, biases can be in-built into models in a myraid of ways. In this blog post, we will be focusing on historical biases in machine learning (ML).

In our daily lives, when we invoke bias, we often mean “judgement based on preconceived notions or prejudices, as opposed to the impartial evaluation of facts”. Statisticians also use “bias” to describe pretty much anything which may lead to a systematic disparity between the ‘true’ parameters and what is estimated by the model.

ML models suffer from statistical biases since statistics play a big role in how they work. However, these models are also designed by humans, and use data generated by humans for training, making them vulnerable to learning and perpetuating human biases. Thus, perhaps counterintuitively, ML models are arguably more susceptible to biases than humans, not less.

Experts disagree on the exact number of algorithmic biases, but there are at least 7 potential sources of harmful bias (Suresh & Guttag, 2021), each generated at a different point in the data analysis pipeline:

Historical bias, which arises from the world, in the data generation phase;
Representation bias, which comes about when we take samples of data from the world;
Measurement bias, where the metrics we use or the data we collect might not reflect what we actually want to measure;
Aggregation bias, where we apply the same approach to our whole data set, even though there are subsets which need to be treated differently;
Learning bias, where the ways we have defined our models cause systematic errors;
Evaluation bias, where we ‘grade’ our models’ performances on data which does not actually reflect the population we want to use the models on, and finally;
Deployment bias, where the model is not used in the way the developers intended for it to be used.

Light trail symbolising data streams — Photo by Hunter Harritt on Unsplash

While all of these are important biases, which any budding data scientist should consider, today I will be focusing on historical bias, which occurs at the first stage of the pipeline.

Psst! Interested in learning more about other types of biases? Watch this helpful video:

[ad_2]

Source link

(Un)Objective Machines: A Look at Historical Bias in Machine Learning | by Gretel Tan | Apr, 2024

A deep dive into biases in machine learning, with a focus on historical (or social) biases.

Statistical Solutions

Model-based Solutions

Human-based Solutions

Leave a comment Cancel reply

You May Also Like

An Introduction To Fine-Tuning Pre-Trained Transformers Models | by Ram Vegiraju | Feb, 2024

Visualizing trade flow in Python maps — Part I: Bi-directional trade flow maps | by Himalaya Bir Shrestha | Dec, 2023