TRIPOD-LLM is now published in Nature Medicine

Jack Gallifant

Alignment | AI | Healthcare

Harvard-MGB AIM

About Me

Exploring the Intersection of Healthcare and AI

As a trained physician and now postdoctoral researcher at Harvard-MGB AI in Medicine Group, I strive to understand and shape how AI can be aligned with human values, particularly in the realm of healthcare. My goal is to contribute to a future where AI can be used to improve health outcomes for everyone.

Artificial Intelligence

Systems with superhuman capabilities are increasingly possible and I am deeply interested in understanding these tools at a mechanistic level, ensuring they can operate safely fairly.

Recent work involves investigating how large language moels (LLMs) encode clinical information across subgroups, and exploring ways to mitigate bias.

Healthcare

Integration of AI into healthcare is desperately needed, yet the tools that that facilitate safe deployment and monitoring are far less mature than current modelling capabilities.

I develop new methods and frameworks for monitoring, evaluating, and updating AI tools pre and post deployment.

Research Focus

Better Benchmarking

Current LLMs and AI are operating at a level that consistently outperforms humans in many domains. However, the methods of comparison and evaluation do not faithfully represent the healthcare systems they will be deployed in. It is therefore important we develop better methods for interrogating models for safety, efficacy, and biases.

Three key pillars of my research are the following:

Interpretability

Reverse Engineering AI Systems

Employing mechanistic interpretability to demystify black box AI decision-making processes.

Frameworks

Reducing Cycle Times

Implementing systems for automated feedback of AI in the wild.

Robustness

Setting the Standards

Establishing dynamic benchmarks that test the most up to date information in multiple ways.

News

Latest Updates

Latest updates on the research, publications, and events.

←

Featured Work

Selected Projects

A collection of research and projects that are of particular interest.

TRIPOD-LLM: Standardized Reporting for LLMs in Healthcare
Enhancing quality and reproducibility of LLM research
TRIPOD-LLM provides a comprehensive framework for transparent reporting of large language models in healthcare applications.
Developed through expert consensus, these guidelines introduce a modular format with 19 main items and 50 subitems, addressing the unique challenges of LLMs in biomedical research.
Learn more, TRIPOD-LLM: Standardized Reporting for LLMs in Healthcare
Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
Robustness of LLMs across entity swapping
This study investigates the surprising fragility of large language models (LLMs) when faced with drug name variations in biomedical contexts.
Through systematic analysis using the RABBITS framework, we expose how LLMs struggle with brand and generic drug name substitutions, potentially impacting their reliability in healthcare applications.
Learn more, Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks
Cross-Care: Unveiling Biases in Large Language Models
Evaluating Model Preferences Across Alignment Strategies
This research initiative delves into the biases inherent in large language models, particularly those used in healthcare applications.
Through systematic analysis of "The Pile," Cross-Care exposes how pre-training data can skew model outputs, potentially leading to misinformed medical insights.
Learn more, Cross-Care: Unveiling Biases in Large Language Models
The World According to LLMs
Understanding the Impact of Training Data on Model Biases
Building on work that shows poor grounding of prevalence estimates from language models, we build a pipeline to evaluate their pretraining data and compare their outputs to this.
Learn more, The World According to LLMs
Using LLMs For Patient Messaging
Understanding the Effectiveness and Safety of LLMs for patient portal messaging
Using LLMs to draft responses to patients questions consumes a significant amount of physician time and LLMs could aid reduce the documentation burden.
This study evaluates the effectiveness of the responses to real world questions and evaluates the rates of potentially harmful responses.
Learn more, Using LLMs For Patient Messaging
Fairness of AI Metrics
AUROC and AUPRC under Class Imbalance
This study disproves popular belief that AUPRC is the best metric in class imbalance settings.
Using a novel theoretical framework, we show that AUPRC is inherently discriminatory, favouring subgroups with higher prevalence of positive labels.
Learn more, Fairness of AI Metrics
Characterizing UK Health Data Flow
Mapping NHS Data
The study explores the UK's NHS data management, uncovering a vast network of data flows across healthcare and research sectors.
Key findings highlight transparency issues and trust concerns in data handling, alongside prevalent non-compliance with safe data access practices.
Learn more, Characterizing UK Health Data Flow
Developing tools to deploy AI safely
Disparity Dashboards
Continous evaluation of AI models is essential to ensure that they are safe to deploy in the real world.
Disparity Dashboards systematically and contiously evaluate the impact of AI models on different subgroups of the population.
Learn more, Developing tools to deploy AI safely
Quantifying digital health inequality across the NHS
Digital Inequality
AI models are only as good as the data they are trained on.
It is essential to understand who is represented in the data, and what opinions are able to contribute to the model.
Learn more, Quantifying digital health inequality across the NHS

Jack Gallifant

About Me

Research Focus

News

Featured Work

TRIPOD-LLM: Standardized Reporting for LLMs in Healthcare

Language Models are Surprisingly Fragile to Drug Names in Biomedical Benchmarks

Cross-Care: Unveiling Biases in Large Language Models

The World According to LLMs

Using LLMs For Patient Messaging

Fairness of AI Metrics

Characterizing UK Health Data Flow

Developing tools to deploy AI safely

Quantifying digital health inequality across the NHS