DataSHIELD: Mitigating disclosure risk in a multi-site federated analys platform

Demetris Avraam*, Rebecca C. Wilson, Soumya Banerjee, Noemi Aguirre Chan, Tom R P Bishop, Olly Butters, Tim Cadman, Luise Cederkvist Kristiansen, Liesbeth Duijts, Xavier Escriba-Montagut, Hugh Garner, Gonçalo Gonçalves, Juan R González, Sido Haakma, Mette Hartlev, Jan Hasenauer, Manuel Huth, Eleanor Hyde, Vincent W V Jaddoe, Yannick MarconMichaela Th Mayrhofer, Fruzsina Molnár-Gábor, Andrei Scott Morgan, Madeleine J. Murtagh, Marc Nestor, Anne-Marie Nybo Andersen, Simon Parker, Angela Pinot de Moira, Florian Schwarz, Katrine Strandberg-Larsen, Morris A Swertz, Marieke Welten, Stuart Wheater, Paul Burton

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

1 Citation (Scopus)
1 Downloads (Pure)

Abstract

Motivation
The validity of epidemiologic findings can be increased using triangulation, i.e. comparison of findings across contexts, and by having sufficiently large amounts of relevant data to analyse. However, access to data is often constrained by practical considerations and by ethico-legal and data governance restrictions. Gaining access to such data can be time-consuming due to the governance requirements associated with data access requests to institutions in different jurisdictions.

Results
DataSHIELD is a software solution that enables remote analysis without the need for data transfer (federated analysis). DataSHIELD is a scientifically mature, open-source data access and analysis platform aligned with the ‘Five Safes’ framework, the international framework governing safe research access to data. It allows real-time analysis while mitigating disclosure risk through an active multi-layer system of disclosure-preventing mechanisms. This combination of real-time remote statistical analysis, disclosure prevention mechanisms, and federation capabilities makes DataSHIELD a solution for addressing many of the technical and regulatory challenges in performing the large-scale statistical analysis of health and biomedical data. This paper describes the key components that comprise the disclosure protection system of DataSHIELD. These broadly fall into three classes: (i) system protection elements, (ii) analysis protection elements, and (iii) governance protection elements.
Original languageEnglish
Article numbervbaf046
JournalBioinformatics Advances
Volume5
Issue number1
Number of pages10
DOIs
Publication statusPublished - 2025

Cite this