Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias
Shan Chen1,2,3Harvard; Mass General Brigham; Boston Children's Hospital, Jack Gallifant4MIT, Mingye Gao4MIT, Pedro Moreira4,10MIT; Universitat Pompeu Fabra, Nikolaj Munch4,5MIT; Aarhus University, Ajay Muthukkumar6University of North Carolina, Arvind Rajan6University of North Carolina, Jaya Kolluri2Mass General Brigham, Amelia Fiske7Technical University of Munich, Janna Hastings8University of Zurich; University of St. Gallen, Hugo Aerts1,2,9Harvard; Mass General Brigham; Maastricht University, Brian Anthony4MIT, Leo Anthony Celi1,2,4,11Harvard; Mass General Brigham; MIT; Beth Israel Deaconess Medical Center, William G. La Cava1,3Harvard; Boston Children's Hospital, Danielle S. Bitterman1,2,3Harvard; Mass General Brigham; Boston Children's Hospital
The Cross-Care project introduces a new benchmark to systematically assess biases in large language models (LLMs), with a focus on healthcare applications. This research initiative highlights how demographic biases from training datasets, particularly The Pile, can skew LLM outputs, misrepresenting disease prevalence across diverse demographic groups.
Our findings indicate significant discrepancies between the representation of disease prevalence in LLMs and actual disease rates within the United States, highlighting a pronounced risk of bias propagation. This misalignment poses serious implications for medical applications of LLMs, where accurate representation is crucial. The study also explores various alignment strategies, revealing their limited effectiveness in correcting these biases across different languages.
Cross-Care: Unveiling Biases in Large Language Models
Cross-Care is an innovative research initiative that scrutinizes large language models (LLMs) for their application in healthcare, focusing particularly on uncovering and addressing biases. These biases often stem from the datasets used during model training, such as "The Pile," which can skew the models' perception and output related to medical information.
Workflow diagram illustrating the Cross-Care project's process for analyzing and addressing biases in large language models within healthcare settings.
Investigating Biases in Model Training Data
We began by examining the co-occurrence of biomedical keywords with demographic data within "The Pile" to determine how often diseases are associated with different demographic groups. This initial analysis revealed significant disparities, such as the overrepresentation of certain demographics.
Visual representation of disease association biases in the training dataset, comparing disease rankings between 'The Pile', model logits, and actual U.S. demographic data.
Model Predictions vs. Real-World Data
Further, we analyzed how models like Pythia and Mamba rank diseases across demographics and compared these rankings with actual disease prevalences. Our findings show a mismatch between model predictions and real-world data, suggesting a lack of real-world grounding in these models.
Bar chart depicting the discrepancies between model predictions and demographic realities, highlighting the mismatch in disease prevalence rankings by gender and race.
Exploring Solutions and Strategies
The project not only highlights these issues but also explores strategies to mitigate these biases. This includes examining different alignment strategies and their effectiveness in improving model accuracy and fairness across diverse demographic groups.
Evaluation of alignment strategies in the LLama series models, showing how different approaches affect model accuracy across race and gender subgroups.
Cross-Care aims to bridge the gap between model perceptions and reality, enhancing the robustness and applicability of LLMs in healthcare. For further details, tools, and to engage with our data visualization platforms, visit our project site here.
Our work builds upon insights into how technology can impact outcomes across subgroups:
Notes: While digital transformation offers unprecedented opportunities for advancing healthcare, it also raises complex ethical and legal challenges. Emerging drivers of health disparity termed digital determinants of health (DDOH) are explored in this piece.
This article can be cited as follows:
Shan Chen, Jack Gallifant, Mingye Gao, Pedro Moreira, Nikolaj Munch, Ajay Muthukkumar, Arvind Rajan, Jaya Kolluri, Amelia Fiske, Janna Hastings, Hugo Aerts, Brian Anthony, Leo Anthony Celi, William G. La Cava, Danielle S. Bitterman. "Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias." Available at arXiv preprint arXiv:2405.05506, 2024.
@misc{chen2024crosscare, title={Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias}, author={Shan Chen and Jack Gallifant and Mingye Gao and Pedro Moreira and Nikolaj Munch and Ajay Muthukkumar and Arvind Rajan and Jaya Kolluri and Amelia Fiske and Janna Hastings and Hugo Aerts and Brian Anthony and Leo Anthony Celi and William G. La Cava and Danielle S. Bitterman}, year={2024}, eprint={2405.05506}, archivePrefix={arXiv}, primaryClass={cs.CL} }