Jack Gallifant

Alignment | AI | Healthcare

Massachusetts Institute of Technology


About Me

Exploring the Intersection of Healthcare and AI

As a trained physician and now postdoctoral researcher at MIT, I strive to understand and shape how AI can be aligned with human values, particularly in the realm of healthcare. My goal is to contribute to a future where AI can be used to improve health outcomes for everyone.

Artificial Intelligence

Systems with superhuman capabilities are increasingly possible and I am deeply interested in understanding these tools at a mechanistic level, ensuring they can operate safely fairly.

Recent work involves investigating how large language moels (LLMs) encode clinical information across subgroups, and exploring ways to mitigate bias.


Integration of AI into healthcare is desperately needed, yet the tools that that facilitate safe deployment and monitoring are far less mature than current modelling capabilities.

I develop new methods and frameworks for monitoring, evaluating, and updating AI tools pre and post deployment.


Research Focus

Better Benchmarking

Current LLMs and AI are operating at a level that consistently outperforms humans in many domains. However, the methods of comparison and evaluation do not faithfully represent the healthcare systems they will be deployed in. It is therefore important we develop better methods for interrogating models for safety, efficacy, and biases.

Three key pillars of my research are the following:


Reverse Engineering AI Systems

Employing mechanistic interpretability to demystify black box AI decision-making processes.


Reducing Cycle Times

Implementing systems for automated feedback of AI in the wild.


Setting the Standards

Establishing dynamic benchmarks that test the most up to date information in multiple ways.



Latest Updates

Latest updates on the research, publications, and events.

  • May 2024CrossCare- Benchmarking LLMs

    Evaluating whether LLMs make predictions in line with real world data.

  • May 2024The World According to LLMs

    New preprint on the connection between pretraining data and model outputs.

  • April 2024LLMs for Patient Messaging

    The effect of using a large language model to respond to patient messages is now published in Lancet Digital Health.


    Featured Work

    Selected Projects

    A collection of research and projects that are of particular interest.

    • Cross-Care: Unveiling Biases in Large Language Models

      Evaluating Model Preferences Across Alignment Strategies

      This research initiative delves into the biases inherent in large language models, particularly those used in healthcare applications.

      Through systematic analysis of "The Pile," Cross-Care exposes how pre-training data can skew model outputs, potentially leading to misinformed medical insights.

      Learn more, Cross-Care: Unveiling Biases in Large Language Models
    • The World According to LLMs

      Understanding the Impact of Training Data on Model Biases

      Building on work that shows poor grounding of prevalence estimates from language models, we build a pipeline to evaluate their pretraining data and compare their outputs to this.

      Learn more, The World According to LLMs
    • Using LLMs For Patient Messaging

      Understanding the Effectiveness and Safety of LLMs for patient portal messaging

      Using LLMs to draft responses to patients questions consumes a significant amount of physician time and LLMs could aid reduce the documentation burden.

      This study evaluates the effectiveness of the responses to real world questions and evaluates the rates of potentially harmful responses.

      Learn more, Using LLMs For Patient Messaging
    • Fairness of AI Metrics

      AUROC and AUPRC under Class Imbalance

      This study disproves popular belief that AUPRC is the best metric in class imbalance settings.

      Using a novel theoretical framework, we show that AUPRC is inherently discriminatory, favouring subgroups with higher prevalence of positive labels.

      Learn more, Fairness of AI Metrics
    • Characterizing UK Health Data Flow

      Mapping NHS Data

      The study explores the UK's NHS data management, uncovering a vast network of data flows across healthcare and research sectors.

      Key findings highlight transparency issues and trust concerns in data handling, alongside prevalent non-compliance with safe data access practices.

      Learn more, Characterizing UK Health Data Flow
    • Developing tools to deploy AI safely

      Disparity Dashboards

      Continous evaluation of AI models is essential to ensure that they are safe to deploy in the real world.

      Disparity Dashboards systematically and contiously evaluate the impact of AI models on different subgroups of the population.

      Learn more, Developing tools to deploy AI safely
    • Quantifying digital health inequality across the NHS

      Digital Inequality

      AI models are only as good as the data they are trained on.

      It is essential to understand who is represented in the data, and what opinions are able to contribute to the model.

      Learn more, Quantifying digital health inequality across the NHS