The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective

Gillian Franklin, Rachel Stephens, Muhammad Piracha, Shmuel Tiosano, Frank Lehouillier, Ross Koppel, Peter L. Elkin*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Artificial intelligence models represented in machine learning algorithms are promising tools for risk assessment used to guide clinical and other health care decisions. Machine learning algorithms, however, may house biases that propagate stereotypes, inequities, and discrimination that contribute to socioeconomic health care disparities. The biases include those related to some sociodemographic characteristics such as race, ethnicity, gender, age, insurance, and socioeconomic status from the use of erroneous electronic health record data. Additionally, there is concern that training data and algorithmic biases in large language models pose potential drawbacks. These biases affect the lives and livelihoods of a significant percentage of the population in the United States and globally. The social and economic consequences of the associated backlash cannot be underestimated. Here, we outline some of the sociodemographic, training data, and algorithmic biases that undermine sound health care risk assessment and medical decision-making that should be addressed in the health care system. We present a perspective and overview of these biases by gender, race, ethnicity, age, historically marginalized communities, algorithmic bias, biased evaluations, implicit bias, selection/sampling bias, socioeconomic status biases, biased data distributions, cultural biases and insurance status bias, conformation bias, information bias and anchoring biases and make recommendations to improve large language model training data, including de-biasing techniques such as counterfactual role-reversed sentences during knowledge distillation, fine-tuning, prefix attachment at training time, the use of toxicity classifiers, retrieval augmented generation and algorithmic modification to mitigate the biases moving forward.

Original languageEnglish
Article number652
JournalLife
Volume14
Issue number6
DOIs
StatePublished - Jun 2024
Externally publishedYes

Funding

FundersFunder number
U.S. Department of Veterans Affairs
NIAAAR21AA026954, R33AA0226954
NCATSUL1TR001412
NIH NLMR25LM014213, T15LM012495

    Keywords

    • algorithms
    • artificial intelligence
    • bias
    • biomedical informatics
    • electronic health records
    • health care
    • machine learning
    • models
    • sociodemographic

    Fingerprint

    Dive into the research topics of 'The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective'. Together they form a unique fingerprint.

    Cite this