Mapping and Evaluating National Data Flows: Transparency, Privacy, and Guiding Infrastructural Transformation

Joe Zhang, BMBCh1,2, Jess Morley, MS3, Jack Gallifant, MSc4,5, Chris Oddy, MBBS6, Prof James T Teo, PhD2,7, Prof Hutan Ashrafian, PhD1,8, Prof A Darzi PhD1
DOI: 10.1016/S2589-7500(23)00157-7
joe.zhang@imperial.ac.uk

Navigating the Complexities of NHS Data Sharing

This study scrutinizes the UK National Health Service's (NHS) electronic health records, revealing significant challenges in data sharing. It maps out data flows to over 460 entities, including academic, commercial, and public sectors. The findings show that multistage data flow chains obscure transparency, jeopardizing public trust. Moreover, most data interactions fail to meet best practices for secure access, raising privacy concerns. The existing infrastructure also leads to duplicate data, diminishing the diversity and value of the data. Recommendations for infrastructure transformation and a new website DataInsights.uk aim to enhance transparency and showcase NHS data assets.

Data Flow Patterns in NHS England

NHS England, comprising 216 hospital trusts and 6,544 primary care providers, manages healthcare interactions for a population of about 56 million. Figure 2 illustrates the national data flows, highlighting four primary models of data extraction: 1) structured clinical codes from primary care EHRs, 2) administrative data from secondary care by NHS Digital, 3) data aggregation within regional shared care record data warehouses, and 4) proprietary secondary care data pipelines.

Clusters and NHS data flows

Electronic patient data flows in NHS England Data flows go upwards and are coloured by destination. For data source and extractors, node size is proportional to population catchment (eg, NHS Digital=55 million). For data consumers, node size is proportional to the number of projects (eg, University of Oxford=178). NHS=National Health Service.

These models vary in the resolution and type of data extracted, ranging from standard clinical codes to high-resolution data from secondary care. The visual representation in Figure 1, with data flow directions and node sizes, provides an insightful overview of the data extraction sources and their reach.

Secondary Use Ecosystem and Top Data Consumers

Atomic Mistakes Diagram

Figure 3. Voronoi chart showing eight top consumers for NHS data across each of six categories, by number of discovered projects during the study period.

The NHS data, as revealed in Figure 2, feeds a diverse and extensive ecosystem of secondary uses, involving over 460 non-NHS organizations. These entities, which have accessed, maintained, or utilized NHS data since April 2021, include a wide array of sectors such as academia, pharmaceuticals, life sciences, and non-profits. Prominent among these are 216 universities, 143 companies in life sciences and data analytics, and 44 non-profit organizations. The figure also shows the eight top consumers across six categories, demonstrating the dominant forms of data use, which span research studies, publications, audits, and various forms of partnerships. This comprehensive view underlines the significant reach and impact of NHS data beyond its immediate healthcare context.

Balance and Diversity of NHS Data Assets

Atomic Mistakes Diagram

Figure 3. Individual data assets per extractor type, showing volume of data types and linkages

The data extractors within the NHS vary significantly in type and volume of data maintained, acting as multipliers in the data distribution network. Figure 3 highlights this diversity, showing primary care data as the most prevalent type maintained. Whole-population primary care data are accessible for COVID-19 research and through platforms like OpenSAFELY. The figure also reveals an overlap in data extractions, with some primary care practices reporting data extraction by multiple databases, indicating substantial duplication. This comprehensive view underscores the complex landscape of data assets within the NHS, from primary care records to shared care and regional systems, each contributing to a vast, yet intricate web of data flows.

NHS Data Transformation Recommendations

  • Enhance Public Transparency: Ensure transparent reporting of data usage at various dissemination nodes to prevent the need for investigative discovery and protect against data breaches.
  • Revise Opt-Out Conditions: Set opt-out options at the level of data distribution to different consumer types, rather than at the point of extraction, to maintain patient autonomy and access to data-driven interventions.
  • Utilize Existing Infrastructure: Improve and expand the use of current infrastructure, like NHS Digital and OpenSAFELY, through public outreach and education, before introducing new Secure Data Environments (SDEs).
  • Develop New Data Infrastructure: Focus on extracting untapped secondary care EHR data and enhancing multimodal data availability, rather than redistributing existing data. Consider a national federated data platform for regional analytics, emphasizing privacy and reducing bulk data transfers.
  • Focus on Intervention Capabilities: Shift infrastructure development towards interventions (not just analysis), including faster data provision, improved regulatory processes, and AI production capabilities. Leverage regional centers for developing such infrastructure.
  • Assess Monetary Value Transfer: Evaluate the financial flow across data chains to determine the value return to the healthcare system and establish beneficial revenue models for both patients and providers.

Related Work

Our work builds upon insights in other work that has examined robustness of models and metrics among subpopulations:

First Image Description

Watson, Hope*, Gallifant, Jack, Lai, Yuan et al., Delivering on NIH data sharing requirements: avoiding Open Data in Appearance Only. 2023.

Notes: This work proposes a framework that states the main risks associated with data sharing, systematically presents risk mitigation strategies and provide examples through a healthcare lens In order to move towards Open Data, the creation of mechanisms for incentivisation, beginning with recentring data sharing on patient benefits, is required.

Citation Details

For academic referencing, please cite this work as follows.

Bibliography

Joe Zhang, Jess Morley, Jack Gallifant, Chris Oddy, James T Teo, Hutan Ashrafian, Brendan Delaney, Ara Darzi, "Mapping and evaluating national data flows: transparency, privacy, and guiding infrastructural transformation," The Lancet Digital Health, Volume 5, Issue 10, 2023, Pages e737-e748, ISSN 2589-7500, [https://doi.org/10.1016/S2589-7500(23)00157-7](https://www.sciencedirect.com/science/article/pii/S2589750023001577).

BibTeX

@article{zhang2023mapping,
                title={Mapping and evaluating national data flows: transparency, privacy, and guiding infrastructural transformation},
                author={Zhang, Joe and Morley, Jess and Gallifant, Jack and Oddy, Chris and Teo, James T and Ashrafian, Hutan and Delaney, Brendan and Darzi, Ara},
                journal={The Lancet Digital Health},
                volume={5},
                number={10},
                pages={e737-e748},
                year={2023},
                publisher={Elsevier},
                doi={10.1016/S2589-7500(23)00157-7}
                url={https://www.sciencedirect.com/science/article/pii/S2589750023001577}
                }