Abstract
Background: Variation in laboratory healthcare data due to seasonal changes is a widely accepted phenomenon. Seasonal variation is generally not systematically accounted for in healthcare settings. This study applies a newly developed adjustment method for seasonal variation to analyze the effect seasonality has on machine learning model classification of diagnoses. Methods: Machine learning methods were trained and tested on ~ 22 million unique records from ~ 575,000 unique patients admitted to Danish hospitals. Four machine learning models (adaBoost, decision tree, neural net, and random forest) classifying 35 diseases of the circulatory system (ICD-10 diagnosis codes, chapter IX) were run before and after seasonal adjustment of 23 laboratory reference intervals (RIs). The effect of the adjustment was benchmarked via its contribution to machine learning models trained using hyperparameter optimization and assessed quantitatively using performance metrics (AUROC and AUPRC). Results: Seasonally adjusted RIs significantly improved cardiovascular disease classification in 24 of the 35 tested cases when using neural net models. Features with the highest average feature importance (via SHAP explainability) across all disease models were sex, C- reactive protein, and estimated glomerular filtration. Classification of diseases of the vessels, such as thrombotic diseases and other atherosclerotic diseases consistently improved after seasonal adjustment. Conclusions: As data volumes increase and data-driven methods are becoming more advanced, it is essential to improve data quality at the pre-processing level. This study presents a method that makes it feasible to introduce seasonally adjusted RIs into the clinical research space in any disease domain. Seasonally adjusted RIs generally improve diagnoses classification and thus, ought to be considered and adjusted for in clinical decision support methods.
Originalsprog | Engelsk |
---|---|
Artikelnummer | 62 |
Tidsskrift | BMC Medical Informatics and Decision Making |
Vol/bind | 24 |
Udgave nummer | 1 |
Antal sider | 11 |
ISSN | 1472-6947 |
DOI | |
Status | Udgivet - 2024 |
Bibliografisk note
Funding Information:We would like the thank the Novo Nordisk Foundation and the Danish Innovation fund for their funding support of this project. This study has been approved by The Danish Data Protection Agency (ref: 514–0255/18–3000, 514–0254/18–3000, SUND-2016–50), The Danish Health Data Authority (ref: FSEID-00003724 and FSEID-00003092) and The Danish Patient Safety Authority (3–3013-1731/1/). The study has been approved as a registry study where patient consent is not needed in Denmark.
Funding Information:
Open access funding provided by Copenhagen University This research was supported by the Novo Nordisk Foundation (NNF14CC0001 and NNF17OC0027594) as well as the Danish Innovation Fund (5184-00102B) for providing funding for the study. V. Muse is the recipient of a fellowship from the Novo Nordisk Foundation as part of the Copenhagen Bioscience Ph.D. Program, supported through grant NNF19SA0035440.
Publisher Copyright:
© The Author(s) 2024.