Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias

TRIPOD-LLM is now published in Nature Medicine

Shan Chen^1,2,3Harvard; Mass General Brigham; Boston Children's Hospital, Jack Gallifant⁴MIT, Mingye Gao⁴MIT, Pedro Moreira^4,10MIT; Universitat Pompeu Fabra, Nikolaj Munch^4,5MIT; Aarhus University, Ajay Muthukkumar⁶University of North Carolina, Arvind Rajan⁶University of North Carolina, Jaya Kolluri²Mass General Brigham, Amelia Fiske⁷Technical University of Munich, Janna Hastings⁸University of Zurich; University of St. Gallen, Hugo Aerts^1,2,9Harvard; Mass General Brigham; Maastricht University, Brian Anthony⁴MIT, Leo Anthony Celi^1,2,4,11Harvard; Mass General Brigham; MIT; Beth Israel Deaconess Medical Center, William G. La Cava^1,3Harvard; Boston Children's Hospital, Danielle S. Bitterman^1,2,3Harvard; Mass General Brigham; Boston Children's Hospital

ArXiv

Addressing Biases in Language Models for Healthcare

The Cross-Care project introduces a new benchmark to systematically assess biases in large language models (LLMs), with a focus on healthcare applications. This research initiative highlights how demographic biases from training datasets, particularly The Pile, can skew LLM outputs, misrepresenting disease prevalence across diverse demographic groups.

Our findings indicate significant discrepancies between the representation of disease prevalence in LLMs and actual disease rates within the United States, highlighting a pronounced risk of bias propagation. This misalignment poses serious implications for medical applications of LLMs, where accurate representation is crucial. The study also explores various alignment strategies, revealing their limited effectiveness in correcting these biases across different languages.

Cross-Care: Unveiling Biases in Large Language Models

Cross-Care is an innovative research initiative that scrutinizes large language models (LLMs) for their application in healthcare, focusing particularly on uncovering and addressing biases. These biases often stem from the datasets used during model training, such as "The Pile," which can skew the models' perception and output related to medical information.

Workflow diagram illustrating the Cross-Care project's process for analyzing and addressing biases in large language models within healthcare settings.

Investigating Biases in Model Training Data

We began by examining the co-occurrence of biomedical keywords with demographic data within "The Pile" to determine how often diseases are associated with different demographic groups. This initial analysis revealed significant disparities, such as the overrepresentation of certain demographics.

Visual representation of disease association biases in the training dataset, comparing disease rankings between 'The Pile', model logits, and actual U.S. demographic data.

Model Predictions vs. Real-World Data

Further, we analyzed how models like Pythia and Mamba rank diseases across demographics and compared these rankings with actual disease prevalences. Our findings show a mismatch between model predictions and real-world data, suggesting a lack of real-world grounding in these models.

Top Ranked Gender and Race Subgroups Across Diseases

Bar chart depicting the discrepancies between model predictions and demographic realities, highlighting the mismatch in disease prevalence rankings by gender and race.

Exploring Solutions and Strategies

The project not only highlights these issues but also explores strategies to mitigate these biases. This includes examining different alignment strategies and their effectiveness in improving model accuracy and fairness across diverse demographic groups.

Alignment Strategies in LLama Series Models

Evaluation of alignment strategies in the LLama series models, showing how different approaches affect model accuracy across race and gender subgroups.

Cross-Care aims to bridge the gap between model perceptions and reality, enhancing the robustness and applicability of LLMs in healthcare. For further details, tools, and to engage with our data visualization platforms, visit our project site here.

Related Work

Our work builds upon insights into how technology can impact outcomes across subgroups:

Gallifant, J., Celi, L.A. & Pierce, R.L. Digital determinants of health: opportunities and risks amidst health inequities. 2023.

Notes: While digital transformation offers unprecedented opportunities for advancing healthcare, it also raises complex ethical and legal challenges. Emerging drivers of health disparity termed digital determinants of health (DDOH) are explored in this piece.

How To Cite

This article can be cited as follows:

Bibliography

Shan Chen, Jack Gallifant, Mingye Gao, Pedro Moreira, Nikolaj Munch, Ajay Muthukkumar, Arvind Rajan, Jaya Kolluri, Amelia Fiske, Janna Hastings, Hugo Aerts, Brian Anthony, Leo Anthony Celi, William G. La Cava, Danielle S. Bitterman. "Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias." Available at arXiv preprint arXiv:2405.05506, 2024.

BibTeX

@misc{chen2024crosscare,
                title={Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias}, 
                author={Shan Chen and Jack Gallifant and Mingye Gao and Pedro Moreira and Nikolaj Munch and Ajay Muthukkumar and Arvind Rajan and Jaya Kolluri and Amelia Fiske and Janna Hastings and Hugo Aerts and Brian Anthony and Leo Anthony Celi and William G. La Cava and Danielle S. Bitterman},
                year={2024},
                eprint={2405.05506},
                archivePrefix={arXiv},
                primaryClass={cs.CL}
              }