Seasonally adjusted laboratory reference intervals to improve the performance of machine learning models for classification of cardiovascular diseases

Victorine P. Muse, Davide Placido, Amalie D. Haue, Søren Brunak*

*Corresponding author for this work

Research output: Contribution to journalJournal articleResearchpeer-review

3 Downloads (Pure)

Abstract

Background: Variation in laboratory healthcare data due to seasonal changes is a widely accepted phenomenon. Seasonal variation is generally not systematically accounted for in healthcare settings. This study applies a newly developed adjustment method for seasonal variation to analyze the effect seasonality has on machine learning model classification of diagnoses. Methods: Machine learning methods were trained and tested on ~ 22 million unique records from ~ 575,000 unique patients admitted to Danish hospitals. Four machine learning models (adaBoost, decision tree, neural net, and random forest) classifying 35 diseases of the circulatory system (ICD-10 diagnosis codes, chapter IX) were run before and after seasonal adjustment of 23 laboratory reference intervals (RIs). The effect of the adjustment was benchmarked via its contribution to machine learning models trained using hyperparameter optimization and assessed quantitatively using performance metrics (AUROC and AUPRC). Results: Seasonally adjusted RIs significantly improved cardiovascular disease classification in 24 of the 35 tested cases when using neural net models. Features with the highest average feature importance (via SHAP explainability) across all disease models were sex, C- reactive protein, and estimated glomerular filtration. Classification of diseases of the vessels, such as thrombotic diseases and other atherosclerotic diseases consistently improved after seasonal adjustment. Conclusions: As data volumes increase and data-driven methods are becoming more advanced, it is essential to improve data quality at the pre-processing level. This study presents a method that makes it feasible to introduce seasonally adjusted RIs into the clinical research space in any disease domain. Seasonally adjusted RIs generally improve diagnoses classification and thus, ought to be considered and adjusted for in clinical decision support methods.

Original languageEnglish
Article number62
JournalBMC Medical Informatics and Decision Making
Volume24
Issue number1
Number of pages11
ISSN1472-6947
DOIs
Publication statusPublished - 2024

Bibliographical note

Publisher Copyright:
© The Author(s) 2024.

Keywords

  • Cardiovascular Disease
  • Diagnostics
  • Digital Health
  • Electronic Health Records
  • Laboratory Values
  • Machine Learning
  • Seasonality

Cite this