A new pipeline for the normalization and pooling of metabolomics data

Vivian Viallon*, Mathilde His, Sabina Rinaldi, Marie Breeur, Audrey Gicquiau, Bertrand Hemon, Kim Overvad, Anne Tjønneland, Agnetha Linn Rostgaard-Hansen, Joseph A Rothwell, Lucie Lecuyer, Gianluca Severi, Rudolf Kaaks, Theron Johnson, Matthias B. Schulze, Domenico Palli, Claudia Agnoli, Salvatore Panico, Rosario Tumino, Fulvio RicceriW. M. Monique Verschuren, Peter Engelfriet, Charlotte Onland-Moret, Roel Vermeulen, Therese Haugdahl Nøst, Ilona Urbarova, Raul Zamora-Ros, Miguel Rodriguez-Barranco, Pilar Amiano, José Maria Huerta, Eva Ardanaz, Olle Melander, Filip Ottoson, Linda Vidman, Matilda Rentoft, Julie A. Schmidt, Ruth C. Travis, Elisabete Weiderpass, Mattias Johansson, Laure Dossus, Mazda Jenab, Marc J Gunter, Justo Lorenzo Bermejo, Dominique Scherer, Reza M Salek, Pekka Keski-Rahkonen, Pietro Ferrari

*Corresponding author af dette arbejde

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningpeer review

14 Citationer (Scopus)

Abstract

Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples’ originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.

OriginalsprogEngelsk
Artikelnummer631
TidsskriftMetabolites
Vol/bind11
Udgave nummer9
Antal sider18
ISSN2218-1989
DOI
StatusUdgivet - 2021
Udgivet eksterntJa

Bibliografisk note

(Ekstern)
Funding Information:
The coordination of EPIC is financially supported by International Agency for Research on Cancer (IARC) and by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by: Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle G?n?rale de l?Education Nationale, Institut National de la Sant? et de la Recherche M?dicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam-Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS)?Instituto de Salud Carlos III (ISCIII), Regional Governments of Andaluc?a, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology?ICO (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Sk?ne and V?sterbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk; MR/M012190/1 to EPIC-Oxford) (United Kingdom). IDIBELL acknowledges support from the Generalitat de Catalunya through the CERCA Program. R.Z.-R. would like to thank the ?Miguel Servet? program (CPII20/00009) from the Institute of Health Carlos III (Spain) and the European Social Fund (ESF). The breast cancer study (BREA) was funded by the French National Cancer Institute (grant number 2015-166). The colorectal cancer studies (CLRT1 and CRLT2) were funded by World Cancer Research Fund (MG; reference: 2013/1002; www.wcrf.org/, accessed on 14 September 2021), the European Commission (MG; FP7: BBMRI-LPC; reference: 313010; https://ec.europa.eu/, accessed on 14 September 2021). The endometrial cancer study (ENDO) was funded by Cancer Research UK (grant number C19335/A21351). The kidney study (KIDN) was funded by the World Cancer Research Fund (MJ; reference: 2014/1193; www.wcrf.org/, accessed on 14 September 2021) and the European Commission (MJ; FP7: BBMRI-LPC; reference: 313010; https://ec.europa.eu/, accessed on 14 September 2021). The generation of metabolomics data in the gallbladder cancer study (GLBD) was supported by the European Union within the initiative ?Biobanking and Biomolecular Research Infrastructure?Large Prospective Cohorts? (Collaborative study ?Identification of biomarkers for gallbladder cancer risk prediction?Towards personalized prevention of an orphan disease?) under grant agreement no. 313010 (BBMRI-LPC). The liver cancer study (LIVE) was supported in part by the French National Cancer Institute (L?Institut National du Cancer; INCa; grant numbers 2009-139 and 2014-1-RT-02-CIRC-1; PI: M. Jenab) and by internal funds of the IARC. For the participants in the prostate cancer study (PROS), sample retrieval and preparation and assays of metabolites were supported by Cancer Research UK (C8221/A19170), and funding for grant 2014/1183 was obtained from the World Cancer Research Fund (WCRF UK), as part of the World Cancer Research Fund International grant program. Mathilde His? work reported here was undertaken during the tenure of a postdoctoral fellowship awarded by the International Agency for Research on Cancer, financed by the Fondation ARC.

Funding Information:
Funding: The coordination of EPIC is financially supported by International Agency for Research on Cancer (IARC) and by the Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London which has additional infrastructure support provided by the NIHR Imperial Biomedical Research Centre (BRC). The national cohorts are supported by: Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, Institut National de la Santé et de la Recherche Médicale (INSERM) (France); German Cancer Aid, German Cancer Research Center (DKFZ), German Institute of Human Nutrition Potsdam-Rehbruecke (DIfE), Federal Ministry of Education and Research (BMBF) (Germany); Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, Compagnia di SanPaolo and National Research Council (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), Statistics Netherlands (The Netherlands); Health Research Fund (FIS)—Instituto de Salud Carlos III (ISCIII), Regional Governments of Andalucía, Asturias, Basque Country, Murcia and Navarra, and the Catalan Institute of Oncology—ICO (Spain); Swedish Cancer Society, Swedish Research Council and County Councils of Skåne and Västerbotten (Sweden); Cancer Research UK (14136 to EPIC-Norfolk; C8221/A29017 to EPIC-Oxford), Medical Research Council (1000143 to EPIC-Norfolk; MR/M012190/1 to EPIC-Oxford) (United Kingdom). IDIBELL acknowledges support from the Generalitat de Catalunya through the CERCA Program. R.Z.-R. would like to thank the “Miguel Servet” program (CPII20/00009) from the Institute of Health Carlos III (Spain) and the European Social Fund (ESF). The breast cancer study (BREA) was funded by the French National Cancer Institute (grant number 2015-166). The colorectal cancer studies (CLRT1 and CRLT2) were funded by World Cancer Research Fund (MG; reference: 2013/1002; www.wcrf.org/, accessed on 14 September 2021), the European Commission (MG; FP7: BBMRI-LPC; reference: 313010; https://ec.europa.eu/, accessed on 14 September 2021). The endometrial cancer study (ENDO) was funded by Cancer Research UK (grant number C19335/A21351). The kidney study (KIDN) was funded by the World Cancer Research Fund (MJ; reference: 2014/1193; www.wcrf.org/, accessed on 14 September 2021) and the European Commission (MJ; FP7: BBMRI-LPC; reference: 313010; https://ec.europa.eu/, accessed on 14 September 2021). The generation of metabolomics data in the gallbladder cancer study (GLBD) was supported by the European Union within the initiative “Biobanking and Biomolecular Research Infrastructure—Large Prospective Cohorts” (Collaborative study “Identification of biomarkers for gallbladder cancer risk prediction—Towards personalized prevention of an orphan disease”) under grant agreement no. 313010 (BBMRI-LPC). The liver cancer study (LIVE) was supported in part by the French National Cancer Institute (L’Institut National du Cancer; INCa; grant numbers 2009-139 and 2014-1-RT-02-CIRC-1; PI: M. Jenab) and by internal funds of the IARC. For the participants in the prostate cancer study (PROS), sample retrieval and preparation and assays of metabolites were supported by Cancer Research UK (C8221/A19170), and funding for grant 2014/1183 was obtained from the World Cancer Research Fund (WCRF UK), as part of the World Cancer Research Fund International grant program. Mathilde His’ work reported here was undertaken during the tenure of a postdoctoral fellowship awarded by the International Agency for Research on Cancer, financed by the Fondation ARC.

Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.

Citationsformater