TY - JOUR
T1 - Model-Based Causal Feature Selection for General Response Types
AU - Kook, Lucas
AU - Saengkyongam, Sorawit
AU - Lundborg, Anton Rask
AU - Hothorn, Torsten
AU - Peters, Jonas
N1 - Publisher Copyright:
© 2024 The Author(s). Published with license by Taylor & Francis Group, LLC.
PY - 2024
Y1 - 2024
N2 - Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters, Bühlmann, and Meinshausen) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor Type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (tram) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose tram-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift trams, we also consider tram-Wald, which tests invariance based on the Wald statistic. We provide an open-source (Formula presented.) package tramicp and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
AB - Discovering causal relationships from observational data is a fundamental yet challenging task. Invariant causal prediction (ICP, Peters, Bühlmann, and Meinshausen) is a method for causal feature selection which requires data from heterogeneous settings and exploits that causal models are invariant. ICP has been extended to general additive noise models and to nonparametric settings using conditional independence tests. However, the latter often suffer from low power (or poor Type I error control) and additive noise models are not suitable for applications in which the response is not measured on a continuous scale, but reflects categories or counts. Here, we develop transformation-model (tram) based ICP, allowing for continuous, categorical, count-type, and uninformatively censored responses (these model classes, generally, do not allow for identifiability when there is no exogenous heterogeneity). As an invariance test, we propose tram-GCM based on the expected conditional covariance between environments and score residuals with uniform asymptotic level guarantees. For the special case of linear shift trams, we also consider tram-Wald, which tests invariance based on the Wald statistic. We provide an open-source (Formula presented.) package tramicp and evaluate our approach on simulated data and in a case study investigating causal features of survival in critically ill patients. Supplementary materials for this article are available online, including a standardized description of the materials available for reproducing the work.
KW - Invariant causal prediction
KW - Lifetime and survival analysis
KW - Transformation model
U2 - 10.1080/01621459.2024.2395588
DO - 10.1080/01621459.2024.2395588
M3 - Journal article
AN - SCOPUS:85205373257
JO - Journal of the American Statistical Association
JF - Journal of the American Statistical Association
SN - 0162-1459
ER -