Directed acyclic graphs and causal thinking in clinical risk prediction modeling

Piccininni, Marco; Konigorski, Stefan; Rohmann, Jessica L.; Kurth, Tobias

doi:10.1186/s12874-020-01058-z

BMC Medical Research Methodology

Table 1 Simulation Results: Prediction Tools’ Performance Metrics

From: Directed acyclic graphs and causal thinking in clinical risk prediction modeling

	Logistic, Markov Blanket set (Nsim=100,000)	Logistic, all 24 variables (Nsim=100,000)	Logistic, any variables with a path to the outcome (Nsim=100,000)	Logistic, node’s parent variables (Nsim=100,000)	Lasso, all 24 variables (Nsim=100,000)	Ridge, all 24 variables (Nsim=100,000)	Elastic net, all 24 variables (Nsim=100,000)	Random forest, all 24 variables (Nsim=100,000)
FULL RESULTS: Including all simulated datasets
ICI
N Missing	8032	0	8032	37,272	8597	0	8612	1
Mean (SD)	0.01882 (0.00445)	0.01964 (0.00495)	0.01900 (0.00461)	0.02215 (0.00421)	0.01912 (0.00451)	0.03807 (0.02058)	0.01907 (0.00456)	0.04133 (0.01779)
Median	0.01857	0.01925	0.01867	0.02242	0.01888	0.02895	0.01881	0.03636
Range	0.00290–0.03834	0.00289–0.04330	0.00287–0.04330	0.00290–0.03826	0.00287–0.03919	0.00710–0.18537	0.00340–0.04283	0.00704–0.16493
Number of input variables
N Missing	0	0	0	0	0	0	0	0
Mean (SD)	4.0 (2.8)	24.0 (0.0)	18.9 (7.0)	1.2 (1.3)	24.0 (0.0)	24.0 (0.0)	24.0 (0.0)	24.0 (0.0)
Median	3.0	24.0	22.0	1.0	24.0	24.0	24.0	24.0
Range	0.0–19.0	24.0–24.0	0.0–24.0	0.0–9.0	24.0–24.0	24.0–24.0	24.0–24.0	24.0–24.0
Direct comparison: ICI of various methods compared to Markov Blanket-based logistic tool
N Missing	8032	8032	8032	37,272	9140	8032	9147	8033
< ICI logistic MB, N (%)		39,354 (42.79%)	39,540 (42.99%)	4864 (7.75%)	26,514 (29.18%)	8871 (9.65%)	31,089 (34.22%)	1650 (1.79%)
≥ ICI logistic MB, N (%)		52,614 (57.21%)	52,428 (57.01%)	57,864 (92.25%)	64,346 (70.82%)	83,097 (90.35%)	59,764 (65.78%)	90,317 (98.21%)
COMPLETE CASE RESULTS: only including datasets for which ICI could be estimated for all tools
ICI
N Missing	37,841	37,841	37,841	37,841	37,841	37,841	37,841	37,841
Mean (SD)	0.01956 (0.00463)	0.01975 (0.00477)	0.01970 (0.00473)	0.02211 (0.00421)	0.01995 (0.00471)	0.03886 (0.02177)	0.01990 (0.00476)	0.04049 (0.02011)
Median	0.01953	0.01962	0.01960	0.02238	0.01993	0.02883	0.01987	0.03283
Range	0.00290–0.03834	0.00289–0.04330	0.00287–0.04330	0.00290–0.03826	0.00287–0.03919	0.00710–0.18537	0.00340–0.04283	0.00704–0.16493
Number of input variables
N Missing	37,841	37,841	37,841	37,841	37,841	37,841	37,841	37,841
Mean (SD)	4.1 (2.7)	24.0 (0.0)	20.8 (3.9)	1.9 (1.1)	24.0 (0.0)	24.0 (0.0)	24.0 (0.0)	24.0 (0.0)
Median	4.0	24.0	22.0	2.0	24.0	24.0	24.0	24.0
Range	1.0–19.0	24.0–24.0	1.0–24.0	1.0–9.0	24.0–24.0	24.0–24.0	24.0–24.0	24.0–24.0
Direct comparison: ICI of various methods compared to Markov Blanket-based logistic tool
N Missing	37,841	37,841	37,841	37,841	37,841	37,841	37,841	37,841
< ICI logistic MB, N (%)		26,872 (43.23%)	27,124 (43.64%)	4850 (7.80%)	16,887 (27.17%)	6508 (10.47%)	19,959 (32.11%)	1636 (2.63%)
≥ ICI logistic MB, N (%)		35,287 (56.77%)	35,035 (56.36%)	57,309 (92.20%)	45,272 (72.83%)	55,651 (89.53%)	42,200 (67.89%)	60,523 (97.37%)

In a series of 100,000 simulated datasets, we obtained these results for ICI and number of input variables for the eight investigated prediction tools. Full results and complete case results, including only datasets for which ICI could be estimated for all tools are presented
Abbreviations: ICI integrated calibration index, MB Markov Blanket, Nsim number of simulations, SD standard deviation

Back to article page

ISSN: 1471-2288

Contact us

General enquiries: journalsubmissions@springernature.com