The rise of multiple imputation: a review of the reporting and implementation of the method in medical research

Hayati Rezvan, Panteha; Lee, Katherine J; Simpson, Julie A

doi:10.1186/s12874-015-0022-1

BMC Medical Research Methodology

Table 4 Reporting of MI procedure in articles using multiple imputation

From: The rise of multiple imputation: a review of the reporting and implementation of the method in medical research

	Type of studies
Characteristics reported	Trials	Observational studies	All studies
	N (%)*	N (%)*	N (%)*
	(n = 73)	(n = 30)	(n = 103)
Imputation details
Any imputation details provided^a	60 (82)	27 (90)	87 (85)
Imputation method stated	29 (40)	9 (30)	38 (37)
MI using chained equations (MICE)	14	6	20
MI using multivariate normal model (MVNI)^b	7	1	8
MI using predictive mean matching (PMM)	1	0	1
MI using regression-based imputation^c	4	1	5
MI using MICE & PMM^d	1	1	2
MI using propensity score	1	0	1
MI using propensity score or regression modelling^e	1	0	1
General procedure/command specified	5 (7)	2 (7)	7 (7)
Proc MI	4	1	5
MI command	0	1	1
Model-based MI^f	1	0	1
Imputation method inferred	11 (15)	10 (33)	21 (20)
MICE (SAS- IVEware)	1	2	3
MICE (Stata- pre V11)	1	2	3
MICE (Multiple package^g)	1	0	1
MVNI (SAS- pre V9.3-imputed more than 1 variable)	5	1	6
MVNI (R-Amelia II)	0	2	2
MVNI (S-plus)	2	0	2
Regression-based imputation (SAS pre V9.3-imputed 1 categorical variable)	1	3	4
Non-normal variables transformed prior to imputation	6 (8)	6 (20)	12 (12)
Log transformation^h	4	4	8
Logit transformation	0	1	1
General comment about applying normalising transformation	2	1	3
Provided details on the variables included in the imputation model	26 (36)	13 (43)	39 (38)
Included auxiliary variable(s)	6	4	10
Included interaction term(s)	2	2	4
Included auxiliary variable and interaction	3	2	5
No information provided on auxiliary variables and interaction terms	15	5	20
Number of imputations	28 (38)	19 (63)	47 (46)
≤5	8	3	11
10	6	3	9
11-50	8	6	14
100	4	6	10
>100	2	1	3
Carried out diagnostic checks of the imputation modelⁱ	0 (0)	2 (7)	2 (2)
Assessed differences between results obtained from CC/LOCF and MI in the text/table^j	45 (62)	17 (57)	62 (60)
Software details
Imputation software stated^k,l	51 (70)	25 (83)	76 (74)
SAS	23	10	33
Stata	18	9	27
R	6	6	12
Other packages (SOLAS, S-plus, SPSS)	4	0	4
Analysis status of MI
MI used in the primary analysis	26 (36)	12 (40)	38 (37)
MI used as a secondary analysis	47 (64)	19 (63)	66^l (64)
Methods used for primary analysis if MI applied as a secondary analysis
Complete case analysis (CC)^m,n	43	19	62
Last observation carried forward (LOCF)	4	0	4
Sensitivity analysis following MI	3 (4)	0 (0)	3 (3)
Pattern-mixture model approach	1	0	1
Selection model approach	0	0	0
Performed but the method not stated^o	2	0	2

*Unless otherwise stated.
Abbreviations: MI- multiple imputation, MICE- multiple imputation by chained equations, MVNI- multivariate normal imputation, PMM- predictive mean matching, MCMC- Markov chain Monte Carlo, CC- complete case, LOCF- last observation carried forward.
^aAny information provided by the authors with regard to the imputation process. Note: a general procedure/command stated by the authors, and the imputation methods that were inferred by the reviewers are not included in this category.
^bIn five articles [35,61,68,90] MI via MCMC algorithm was used for imputing missing data.
^cIn three articles [40,47,84], logistic regression method and in two articles [39,113], linear regression method were stated as a imputation method for handling missing data.
^dTwo articles [61,93] imputed one or two variables with missing data under PMM (because of non-normality), and imputed other incomplete variables under MICE.
^eOne article [91] stated that MI was used on the basis of either propensity scoring or regression modelling for imputation of missing data in the primary and secondary outcome measures.
^fOne article [51] stated that model-based MI was used to account for missing data in the clinical outcome.
^gIn one article [77] multiple packages were used for the analyses, i.e. SPSS version 15.0 and Stata version 10.1. The default imputation method in either of these packages (given the specified versions) was chained equations.
^hOne article [93] used both the square root and log transformations for non-normally distributed variables.
ⁱBoth articles [82,130] compared the observed and imputed data.
^jThe MI estimates were not provided in 6 articles [34,37,81,85,87,120], instead a comparison of the results between the different approaches for dealing with the missing data was commented on in the text (e.g. the analysis of complete cases and the imputed data provided the same results).
^kFor eight articles [59,77,81,88,94,96,115,127] it was not possible to extract this information because multiple packages for the statistical analyses were mentioned with no explicit statement regarding which package was used for imputation.
^lThose articles that did not provide the name of the imputation software (R, Stata, SAS, etc.), but instead gave the name of the procedure/application used for imputing missing data (e.g. Amelia II, IVEware) were also included here.
^mOne article [99] used MI as well as CC for primary analysis to impute the missing confounder values (with no imputation of missing data in the exposure and outcome), and used MI again as a sensitivity analysis to impute missing data in all confounders and the outcome (but not the exposure), as well as a CC.
ⁿTwo articles [40,100] used LOCF for the secondary analysis as well as MI; one of them described the MI as a sensitivity analysis.
^oA general statement was made about performing a sensitivity analysis but the results of the details were not provided.

Back to article page

ISSN: 1471-2288

Contact us

General enquiries: journalsubmissions@springernature.com