TY - JOUR
T1 - Constructing causal life course models
T2 - Comparative study of data-driven and theory-driven approaches
AU - Petersen, Anne Helby
AU - Ekstrøm, Claus Thorn
AU - Spirtes, Peter
AU - Osler, Merete
N1 - © The Author(s) 2023. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of Public Health. All rights reserved. For permissions, please e-mail: [email protected].
PY - 2023
Y1 - 2023
N2 - Life course epidemiology relies on specifying complex (causal) models that describe how variables interplay over time. Traditionally, such models have been constructed by perusing existing theory and previous studies. By comparing data-driven and theory-driven models, we investigate whether data-driven causal discovery algorithms can help this process. We focus on a longitudinal dataset following a cohort of Danish men. The theory-driven models are constructed by two subject-field experts. The data-driven models are constructed by use of temporal Peter-Clark (TPC) algorithm. TPC utilizes the temporal information embedded in life course data. We find that the data-driven models recover some, but not all, causal relationships included in the theory-driven expert models. The data-driven method is especially good at identifying direct causal relationships that the experts have high confidence in. Moreover, in a post-hoc assessment we found that most of the direct causal relationships proposed by the data-driven model, but not included in the theory-driven model, were plausible. Thus, the data-driven model may propose additional meaningful causal hypothesis that are new or have been overlooked by the experts. In conclusion, data-driven methods can aid causal model construction in life course epidemiology, and combining both data-driven and theory-driven methods can lead to even stronger models.
AB - Life course epidemiology relies on specifying complex (causal) models that describe how variables interplay over time. Traditionally, such models have been constructed by perusing existing theory and previous studies. By comparing data-driven and theory-driven models, we investigate whether data-driven causal discovery algorithms can help this process. We focus on a longitudinal dataset following a cohort of Danish men. The theory-driven models are constructed by two subject-field experts. The data-driven models are constructed by use of temporal Peter-Clark (TPC) algorithm. TPC utilizes the temporal information embedded in life course data. We find that the data-driven models recover some, but not all, causal relationships included in the theory-driven expert models. The data-driven method is especially good at identifying direct causal relationships that the experts have high confidence in. Moreover, in a post-hoc assessment we found that most of the direct causal relationships proposed by the data-driven model, but not included in the theory-driven model, were plausible. Thus, the data-driven model may propose additional meaningful causal hypothesis that are new or have been overlooked by the experts. In conclusion, data-driven methods can aid causal model construction in life course epidemiology, and combining both data-driven and theory-driven methods can lead to even stronger models.
U2 - 10.1093/aje/kwad144
DO - 10.1093/aje/kwad144
M3 - Journal article
C2 - 37344193
SN - 0002-9262
VL - 192
SP - 1917
EP - 1927
JO - American Journal of Epidemiology
JF - American Journal of Epidemiology
IS - 11
ER -