Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias

Shan Chen1,2,3, Jack Gallifant4, Mingye Gao4, Pedro Moreira4,10, Nikolaj Munch4,5, Ajay Muthukkumar6, Arvind Rajan6, Jaya Kolluri2, Amelia Fiske7, Janna Hastings8, Hugo Aerts1,2,9, Brian Anthony4, Leo Anthony Celi1,2,4,11, William G. La Cava1,3, Danielle S. Bitterman1,2,3

Addressing Biases in Language Models for Healthcare

The Cross-Care project introduces a new benchmark to systematically assess biases in large language models (LLMs), with a focus on healthcare applications. This research initiative highlights how demographic biases from training datasets, particularly The Pile, can skew LLM outputs, misrepresenting disease prevalence across diverse demographic groups.

Our findings indicate significant discrepancies between the representation of disease prevalence in LLMs and actual disease rates within the United States, highlighting a pronounced risk of bias propagation. This misalignment poses serious implications for medical applications of LLMs, where accurate representation is crucial. The study also explores various alignment strategies, revealing their limited effectiveness in correcting these biases across different languages.

Cross-Care: Unveiling Biases in Large Language Models

Cross-Care is an innovative research initiative that scrutinizes large language models (LLMs) for their application in healthcare, focusing particularly on uncovering and addressing biases. These biases often stem from the datasets used during model training, such as "The Pile," which can skew the models' perception and output related to medical information.

Cross-Care Workflow Diagram

Workflow diagram illustrating the Cross-Care project's process for analyzing and addressing biases in large language models within healthcare settings.

Investigating Biases in Model Training Data

We began by examining the co-occurrence of biomedical keywords with demographic data within "The Pile" to determine how often diseases are associated with different demographic groups. This initial analysis revealed significant disparities, such as the overrepresentation of certain demographics.

Disease Ranking by Demographic

Visual representation of disease association biases in the training dataset, comparing disease rankings between 'The Pile', model logits, and actual U.S. demographic data.

Model Predictions vs. Real-World Data

Further, we analyzed how models like Pythia and Mamba rank diseases across demographics and compared these rankings with actual disease prevalences. Our findings show a mismatch between model predictions and real-world data, suggesting a lack of real-world grounding in these models.

Top Ranked Gender and Race Subgroups Across Diseases

Bar chart depicting the discrepancies between model predictions and demographic realities, highlighting the mismatch in disease prevalence rankings by gender and race.

Exploring Solutions and Strategies

The project not only highlights these issues but also explores strategies to mitigate these biases. This includes examining different alignment strategies and their effectiveness in improving model accuracy and fairness across diverse demographic groups.

Alignment Strategies in LLama Series Models

Evaluation of alignment strategies in the LLama series models, showing how different approaches affect model accuracy across race and gender subgroups.

Cross-Care aims to bridge the gap between model perceptions and reality, enhancing the robustness and applicability of LLMs in healthcare. For further details, tools, and to engage with our data visualization platforms, visit our project site here.

Related Work

Our work builds upon insights into how technology can impact outcomes across subgroups:

First Image Description

Gallifant, J., Celi, L.A. & Pierce, R.L. Digital determinants of health: opportunities and risks amidst health inequities. 2023.

Notes: While digital transformation offers unprecedented opportunities for advancing healthcare, it also raises complex ethical and legal challenges. Emerging drivers of health disparity termed digital determinants of health (DDOH) are explored in this piece.

How To Cite

This article can be cited as follows:

Bibliography

Shan Chen, Jack Gallifant, Mingye Gao, Pedro Moreira, Nikolaj Munch, Ajay Muthukkumar, Arvind Rajan, Jaya Kolluri, Amelia Fiske, Janna Hastings, Hugo Aerts, Brian Anthony, Leo Anthony Celi, William G. La Cava, Danielle S. Bitterman. "Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias." Available at arXiv preprint arXiv:2405.05506, 2024.

BibTeX

@misc{chen2024crosscare,
                title={Cross-Care: Assessing the Healthcare Implications of Pre-training Data on Language Model Bias}, 
                author={Shan Chen and Jack Gallifant and Mingye Gao and Pedro Moreira and Nikolaj Munch and Ajay Muthukkumar and Arvind Rajan and Jaya Kolluri and Amelia Fiske and Janna Hastings and Hugo Aerts and Brian Anthony and Leo Anthony Celi and William G. La Cava and Danielle S. Bitterman},
                year={2024},
                eprint={2405.05506},
                archivePrefix={arXiv},
                primaryClass={cs.CL}
              }