 Technical advance
 Open Access
 Open Peer Review
 Published:
Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis
BMC Medical Research Methodology volume 17, Article number: 129 (2017)
Abstract
Background
Multiple imputation is a recommended method to handle missing data. For significance testing after multiple imputation, Rubin’s Rules (RR) are easily applied to pool parameter estimates. In a logistic regression model, to consider whether a categorical covariate with more than two levels significantly contributes to the model, different methods are available. For example pooling chisquare tests with multiple degrees of freedom, pooling likelihood ratio test statistics, and pooling based on the covariance matrix of the regression model. These methods are more complex than RR and are not available in all mainstream statistical software packages. In addition, they do not always obtain optimal power levels. We argue that the median of the pvalues from the overall significance tests from the analyses on the imputed datasets can be used as an alternative pooling rule for categorical variables. The aim of the current study is to compare different methods to test a categorical variable for significance after multiple imputation on applicability and power.
Methods
In a large simulation study, we demonstrated the control of the type I error and power levels of different pooling methods for categorical variables.
Results
This simulation study showed that for nonsignificant categorical covariates the type I error is controlled and the statistical power of the median pooling rule was at least equal to current multiple parameter tests. An empirical data example showed similar results.
Conclusions
It can therefore be concluded that using the median of the pvalues from the imputed data analyses is an attractive and easy to use alternative method for significance testing of categorical variables.
Background
Logistic regression modelling is a frequently applied method in epidemiological and medical studies. Although researchers try to avoid it, missing data occurs in all kinds of different study designs, and inevitably, also when logistic regression modelling is used. There are many different methods available to handle incomplete data [1, 2]. The most recommended method is multiple imputation (MI). MI is currently implemented in almost all statistical software packages and therefore within reach of many researchers. Hence, it will likely be applied more often. MI generates multiple imputed datasets, where after complete data analysis can be applied to each imputed dataset. Finally, parameter estimates can be combined using Rubin’s Rules (RR) [3].
For logistic regression modelling in combination with MI, the pooled regression coefficients and standard errors can easily be obtained by using RR. The pooled coefficient is derived by averaging the regression coefficient estimates from each complete data analysis result across the imputed datasets. The standard error is obtained by pooling the between imputation variance and the within imputation variance which account for sampling and imputation uncertainty, respectively. The pooled standard error is used to calculate 95% confidence intervals. For dichotomous and continuous covariates in a logistic regression model after MI, RR can easily be applied in combination with a single Wald statistic to obtain a pvalue for significance [4].
To consider whether a categorical variable with more than two levels as a whole significantly contributes to the model, the methods to derive a pooled pvalue are less straightforward. One method that can be used is to combine multiple chisquare values that result from a multiple parameter Wald or likelihood ratio test in each imputed dataset [5]. Alternatively, the pooled multivariate sampling variances of the regression model can be used to conduct a test that resamples a multivariate Wald statistic [6]. Meng and Rubin proposed another method in which the likelihood ratio test statistics are combined to provide a pooled pvalue [7]. Unfortunately, none of these pooling methods are available in mainstream statistical packages. Consequently, the application of these methods may be complex for epidemiologists or other applied researchers, especially the Meng and Rubin pooling method. Furthermore, earlier studies showed that these methods do not always obtain optimal power levels, which is important for significance testing [7]. For that reason, it may be tempting for researchers to fall back on naïve methods, i.e., single imputation procedures, which often result in incorrect parameter estimates for statistical testing [2].
Van de Wiel et al. introduced the median of pvalues in a crossvalidation setting for inferring differences in prediction accuracies [8]. This setting is comparable to MI, because prediction accuracy is first obtained from separate (but related) generated versions of the data and subsequently inferred from those stochastically dependent data sets. This method showed proven control of the type I error rate and also good power results in different simulated data situations. It may therefore be a potential attractive method for significance testing of categorical variables.
Until now, methods to derive a pooled pvalue for significance testing of categorical variables in logistic regression models have never been compared for their control of the type I error rate and power levels in different epidemiological data situations. Therefore, the aim of this study is to compare different pooling methods for significance testing of categorical and also continuous covariates in a logistic regression model after multiple imputation. Specifically type I error control and power after MI in a large simulation study will be evaluated. Moreover the characteristics of the pooling methods are further evaluated in an empirical dataset. In the Multiple imputation section the procedure of multiple imputation is more extensively described. In the Statistical hypothesis testing of a variable after MI section the different pooling methods for statistical testing after MI are discussed and in the Simulation section a simulation study is described that compared the different methods for pooling pvalues of categorical variables. The methods are applied to a clinical dataset in the Application section.
Multiple imputation
Multiple imputation is an advanced method to handle missing data, commonly performed in three phases: imputation, complete data analysis and pooling. In the imputation phase the missing values are replaced with m sets of plausible values. These values are estimated from a series of regression models to generate a posterior predictive distribution of the missing values that is used to draw the imputed values from. Each variable can be modeled according to its own distribution, i.e., continuous variables are modeled with linear regression and dichotomous variables with logistic regression. Imputations are generated within several sequential iteration rounds or chains, referred to as Multivariate Imputation by Chained Equations (MICE) [9, 10].
In the complete data analysis phase each imputed dataset is analyzed separately. The analysis performed is the same method that would have been applied had the data been complete. Accordingly, the analysis phase results in m sets of complete data results. The complete data analysis results from each imputed dataset will differ, because the imputed datasets differ
Rubin’s rules (RR)
After the analyses the results are combined using pooling by RR. For parameter estimates (e.g., regression coefficients), the combined estimate \( \overline{\theta} \) is the average of the estimates from the imputed data analyses:
The standard errors of the parameter estimates are combined by using the withinimputation variance and the betweenimputation variance [11]. The within imputation variance Var( \( \overline{\theta} \) ) _{ within } is the average variance from the imputed data analyses:
The between imputation variance Var( \( \overline{\theta} \) ) _{ between } is the sum of the squared deviation of the parameter estimate of each imputed data analysis from the pooled parameter estimate weighted by m1:
The variance of the parameter estimates is then calculated by combining the within and between variance:
Statistical hypothesis testing of a variable after MI
For logistic regression analysis, statistical testing of covariates after MI can be performed by different methods. The methods to pool the statistical tests after MI will be elaborated below with the focus on testing whether a categorical variable as a whole significantly contributes to the model.
Univariate testing
For twosided hypothesis testing of single regression coefficients in a logistic regression model after MI the Wald statistic W can be calculated as follows:
where \( \overline{\uptheta} \) and \( \mathrm{Var}\left(\overline{\uptheta}\right) \) are the pooled coefficient and corresponding variance, respectively and θ_{0} is the value under the null hypothesis. W _{ single } follows a chisquare distribution with 1 degree of freedom. After MI, the Wald statistic can be calculated from the RR pooled statistics, which makes this an easy to apply method for continuous variables.
Multivariate testing
For categorical variables in the logistic regression model only the pooled statistics for each separate level of a categorical variable can be obtained by RR, not the overall statistic. RR requires access to the variancecovariance matrices. Accordingly, each category can be tested, but the categorical variable as a whole cannot be tested without adapting the method. The several different multivariate pooling methods are discussed below. The formulas for these methods can be found in detail in Additional file 1.
Multiple parameter Wald test (CHI pooling)
One possibility is to pool the chisquare values from the multiple parameter Wald or likelihood ratio tests with multiple degrees of freedom (CHI pooling) [5]. The multiple parameter values are obtained after applying the test to each imputed datasets separately.
The pooled sampling variance (VAR pooling) method
Alternatively, a combination of the pooled parameter estimates and the pooled sampling variances can be used to construct a test that resembles a multivariate Wald test (VAR pooling) [12]. This test pools within and between covariance matrices that are obtained in each imputed dataset and finally corrects the total parameter covariance matrix of the multivariate Wald test by including the average relative increase in variance to account for the missing data [13].
Meng and Rubin pooling (MR pooling)
Meng and Rubin proposed a method to test overall categorical variables indirectly based on the likelihood ratio test statistic (MR pooling) [7]. For each regression parameter, two nested models are fitted in each imputed dataset: one restricted model where the parameter is not included in the model and one full model where the parameter is included. The pooled likelihood ratio tests are then compared to obtain pooled pvalues for each parameter. The MR pooling method requires fitting multiple models for each variable in the data, hence it is an indirect approach. This can be a very timeconsuming process.
The median P rule (MPR)
For the median P rule one simply uses the median pvalue of the significance tests conducted in each imputed dataset (MPR pooling). Hence, it depends on pvalues only and not on the parameter estimates. The MPR was developed in a crossvalidation setting for comparing predictive performances of two methods [8]. In that setting, multiple splits of the same data set into training and test sets render multiple dependent pvalues. Various bounds for the type I error when using thresholds like median P < 0.05 were proven under a variety of assumptions. One of these is the multivariate normal nulldistribution (MVN) for pvalues that are transformed to a standard normal scale: then, median P < α implies a type I error smaller than α. In the imputation setting, dependence between the multiple pvalues is caused by the fact that all imputed datasets share the same observed data. Therefore, the dependency is likely to be strong. In the remainder of the paper we will use the significance rule median P < α, which we refer to as “median P rule” (MPR). For a real dataset the underlying MVN assumption cannot be checked, because we observe only one instance of the pvalue vector. In simulations, however, it can be checked using the asymptotic χ^{2}(p) distribution for the observed Mahalanobis distance as computed from the pvalues transformed to the standard normal scale. Fig. A1 in Additional file 1 shows the empirical distribution of Mahalanobis distance d together with the χ^{2}(10) distribution for 10 imputations (which allows reliable estimation of the inverse covariance matrix, required for computing d). These are based on 1000 simulations (see Simulation Section). Indeed, we observe a good match between the two for both the imputed categorical and the imputed continuous variable. Hence, we conclude that at least in this setting the MVN assumption is reasonable.
The MPR was evaluated by using the pvalues from the likelihood ratio test for multiple parameters for the categorical variables in the multivariable model. In addition, we performed extensive simulations to support the validity of the MPR (Simulation section) and we supply a bootstrapping scheme that allows anyone to check the appropriateness of this rule (and the aforementioned alternatives) for a given data set (Application section).
Simulation
Simulation design
To study the performance of the CHI, VAR, MR and MPR pooling methods after multiple imputation we conducted a simulation study. In this study, data was generated for 250 cases. The data contained one categorical variable with four categories (Factor1) and four continuous variables (Covar1 – Covar4). The categorical variable was first created as a continuous variable, and then categorized by the quartiles of the variable. The categorical variable and the four continuous variables were the covariates in a model for a dichotomous outcome. The predictors were related to the outcome by multiplying coefficient loadings with the data matrix, and the resulting predictor matrix was used to estimate the probability of the outcome using a lognormal transformation of the linear predictor. The categorical variable was coded in the matrix by three dummy variables.
To create a variety of settings the data characteristics were varied. The correlation between the variables was varied between 0.2, 0.4, 0.6 and 0.8. Furthermore we varied the relation of the variables with the outcome by adjusting the coefficient values (betas). The betas for the continuous variables were varied from 0 to 1 with steps of 0.1. The betas of the dummy variables were varied by drawing the coefficients from a normal distribution with mean zero and a variance that also varied from 0 to 1 with steps of 0.1. Hence, for each correlation variation, ten different coefficient situations were simulated for 1000 datasets. This resulted in 40 conditions with 1000 datasets in each of these conditions.
We created missing values in the categorical variable (i.e., Factor1) and in the first continuous variable (i.e., Covar1). The percentage of missing values in both variables was set to either 25% or 40%. Accordingly, we created 40 conditions with 25% missing data, and 40 conditions with 40% missing data. The missing data was related to the other continuous variables in the data in order to simulate a missing at random missing data situation [3]. Each dataset with missing data was then imputed by multiple imputation. The number of imputations was set to 100. The data were analyzed using a generalized linear model.
Comparing methods
We compared the performance of five methods to pool the pvalues of the variable tests. The first method that was used is RR. For the continuous variables this method is used by default in the MICE algorithm in R. However, for the categorical dummy variables, this method will produce three pooled pvalues in our study; one for each dummy. So no overall pvalue is obtained. The second method is MR pooling, the third method the chisquare test with multiple degrees of freedom (CHI pooling), the fourth method the multivariate sampling variance method (VAR pooling), and the fifth method the MPR, which pooled the pvalues from the overall likelihood ratio test in each imputed dataset by taking the median.
For each of the simulated data conditions, the average type I errors and powers of all pooling methods were compared for the incomplete categorical variable and the incomplete continuous variable. We compared the results of the pooled pvalues to the pvalues from the complete data. Those ‘full data’ pvalues were obtained by applying the generalized linear model to the simulated data without missing values, followed by computing the average type I error and power over the 1000 simulated data sets per condition.
Note that it has been shown that for the purpose of regression coefficient estimation, inclusion of the outcome variable in the imputation model (i.e., outcomebased imputation) is recommended [14]. However, for hypothesis testing, outcomebased imputation may lead to overoptimistic pvalues, rendering the pooled test result as too liberal. In the simulation study we investigated this aspect of the imputation model extensively by comparing the performances of the pooling methods for outcomebased imputation with the results when the outcome was excluded from the imputation model.
Description of results
Table 1 presents the type I error for each pooling method compared to the complete data type I error, which is considered as the full data type I error, in the simulated condition when the beta equaled zero and 25% missing data. For all existing pooling methods outcomebased imputation was used, whereas for the MPR, results are presented both with inclusion (MPR_{in}) and without (MPR_{out}). Table 1 shows that the type I errors for all existing pooling methods, and also for the MPR_{out}, were close to the target in all simulation conditions. For the MPR_{in} method the type I error was sometimes too liberal. These findings are confirmed in the situation where 40% of the data is missing (Additional file 2, Table B1).
The power for each of the pooling methods was evaluated for the categorical and continuous variable after imputation of the missing data. Note that the standard errors of the estimated type I error/power, denoted by \( \widehat{p} \), equal \( sd\left(\widehat{p}\right)=\sqrt{\widehat{p}\left(1\widehat{p}\right)/1000} \), which equals 0.007 for target type I error \( \widehat{p} \) = 0.05 and which has its maximum for \( \widehat{p} \) =0.5, rendering 0.015. Given that these standard errors are relatively small we have not displayed these in the Figures. Results are presented separately for outcomebased imputation and imputation where the outcome was excluded from the imputation model. Lastly, we present a summary result where the best strategy for each method is depicted.
Outcomebased imputation
Figure 1 depicts the power for each pooling method after outcomebased imputation for the condition of 25% missing data and a correlation of 0.4, for each simulated coefficient value (beta). Figure 1 shows that for the categorical variable, the power for the CHI, MR and VAR pooling methods was smaller than for the full data situation. The power for the MPR was closer to the full data power. This occurs for all beta values in all data conditions, however as was concluded from Table 1, the type I error for MPR was slightly inflated. For the continuous variable, the MPR results in inflated power levels compared with the full data and the other pooling methods up to a beta value of 0.3. For beta values beyond 0.3 the power for MPR was larger compared to the other methods and closer to the full data power. The conditions with 40% missing data resulted in larger differences between the MPR, CHI, MR, VAR and RR pooling methods than in the 25% missing data conditions (see Additional file 2 for a full overview of the results).
Outcome excluded from imputation model
Figure 2 depicts the power for each pooling method with the outcome excluded from the imputation model for the condition of 25% missing data and a correlation of 0.4, for each simulated coefficient value (beta). Figure 2 shows that for the categorical variable, the power of the CHI, MR and VAR pooling methods is smaller than that of both the full data analysis as of the MPR method. This is also the case for the continuous variable as well as for the situation with 40% missing data (see Additional file 2 for a full overview of the results).
Summary: Outcomebased imputation except for MPR
Figure 3 displays the results of the power analysis when for the RR, CHI, MR and the VAR pooling methods outcomebased imputation was applied, as recommended, and for the MPR when the outcome is excluded for imputation (MPR_{out}). It is clear from these figures that the power for the MPR_{out} method for categorical variables is higher than for all other pooling methods and closer to the full data results at all beta values. Opposite results are found for the continuous variables where RR, CHI, MR and the VAR pooling methods show better power results. The results for 40% missing data and correlations of 0.2, 0.6 and 0.8 confirm these findings with larger differences in power levels between the MPR_{out} and the other pooling methods (see Additional file 2).
Application
To illustrate our methods we used an example dataset adapted from a study about low back pain. The study population consisted of 299 workers that were listed as sick for a period of three weeks due to low back pain. Three treatment groups, highintensity back school, lowintensity back schools and usual treatment by the occupational physician, were compared in a randomized clinical trial. The results for the shortterm effects were published previously [15]. The primary outcome was the difference in pain after three months, measured with a dichotomous variable indicating a difference of at least three points. Possible continuous predictors for the outcome were age, body mass index (BMI), pain at baseline, physical functioning, disability, and kinesiophobia. Categorical predictors were treatment group (three categories), gender (two categories), education level (five categories), and some workrelated physical variables, which were daily exposure to sitting (four categories), heavy lifting (four categories), and working with vibration tools (four categories). The original data contained missing values: 6% for BMI; 5.4% for pain at baseline; 25.5% for physical functioning; 6% for kinesiophobia; 7.8% for educational level; 5.7% for sitting; 5.4% for heavy lifting; and 10% for working with vibration tools. We started with a complete data situation by imputing the missing values once for the purpose of this illustration. This complete dataset was used as a true reference and a multivariable logistic regression model was fitted to this data. Model estimates including the corresponding pvalues for this complete dataset are depicted in Table 2.
Missing values were generated in categorical and continuous variables by the missing at random mechanism, so the probability for missing data was related to other variables in the data, to create a realistic data situation. About 25% of the cases had missing data on BMI, education, heavy lifting, and physical functioning. This incomplete dataset was imputed 100 times without including the outcome variable in the imputation model for MPR pooling of categorical variables (MPR_{out}) and outcomebased imputation was used for the other variables and the RR, MR, CHI and VAR pooling methods. The variables without missing observations were included in the imputation model and multivariate imputation by chained equations was used to impute missing values. The same multivariable logistic regression model was fitted as for the complete data analysis on each imputed dataset. Subsequently, the pvalues for each independent variable were pooled according to the four compared methods: RR, MR pooling, CHI pooling, VAR pooling and the MPR_{out}. Listwise deletion was also applied and presented as comparison, where only the cases with completely observed data are included in the analysis. The resulting pvalues are depicted in Table 3, along with the complete reference data pvalues (‘Full data’) in the first column.
In this example we observe that the MPR performed similar to the Meng and Rubin method, the chisquare pooling with multiple degrees of freedom and pooled sampling variance method. The smaller pvalues were often closer to the pvalues from the complete data analysis for the MPR than for MR, CHI and VAR. Furthermore, multiple imputation improved the results in particular for Group and Sitting when comparing the pooled pvalues to the pvalues after listwise deletion.
We use this data example to show how one can verify control of type I error and power for the various pooling methods using a databased simulation. We used the estimated means and covariance matrix from our own data example to generate 1000 bootstrap samples of the covariates from a multivariate normal distribution. The sample size from the original data was used for the bootstrap samples (n = 299). Subsequently, we used the pooled coefficient estimates after MI from the analysis performed above to create the dichotomous outcome variable. Note that to obtain the coefficient estimates we used outcomebased imputation as recommended [14]. For simulating the outcome variable we only used the coefficients that were significantly contributing to the model by using the MPR pvalues with a threshold of P < 0.05. So the outcome was predicted by BMI, sitting, vibration tools, pain at baseline and disability. Then missing data were generated according to the pattern that was used in the example above and MI was applied with the different methods to pool the pvalues.
Table 4 presents the probability for each variable to obtain a significant result (P < 0.05). From the variables above the dashed line (that are present in the model), we conclude that the power of MPR is larger compared to RR, MR, CHI and VAR pooling. For example, we observe that the power for the detection of the association between the variable ‘Disability’ and the outcome was about 21% higher for MPR than for MR pooling. From the variables below the dashed line (not present in the model) we observe that the type I error of MPR is generally more on target than that of the other methods, although MPR was for some variables slightly anticonservative with a type I error slightly over 0.05, which is not very problematic in practice. These findings are in concordance with those of our simulation study.
Discussion
Multiple imputation is a frequently used method when covariate data is missing in prognostic models. In the process of defining the prognostic model one is most interested in the overall test to make a decision on the relevance of the categorical variable as a whole for the model. However, such a test is not easily applied in the case of MI and asks for (complex) adjustments in the pooling process or for switching between different software packages. If these packages are not available by the researcher, the researcher may fall back to simple but erroneously single imputation techniques. We showed in a simulation study that the MPR is an easytouse method for statistical testing of categorical variables in a multiple imputation context. The performance of the MPR is tested in many different data conditions and proved to be consistently satisfactory. In particular, when compared to alternative methods which are pooling the chisquare values with multiple degrees of freedom, using the pooled sampling variances and the method proposed by Meng and Rubin [7], the MPR performs equally well and the resulting pooled pvalue for the categorical variable is often more on target than the pooled pvalues derived from the other methods.
To obtain a powerful significance test for continuous and dichotomous variables with RR after MI, the MI procedure has to include the outcome variable, as was also indicated by Moons et al. [16]. For overall significance testing of categorical variables by using the multiple parameter Wald test (Chi Pooling), the pooled sampling variance (VAR pooling) method and Meng and Rubin pooling (MR pooling), we suggest following this recommendation. It should be mentioned though that these results assume a correct imputation model, and establishing robustness against misspecification of this model requires further study. The imputation should be outcomebased for continuous variables that are pooled with RR, or when applying the more complex pooling methods (i.e., CHI pooling, VAR pooling or MR pooling), because for these methods the pooling parameters are estimates that are pooled after which the result of the hypothesis test is obtained. However, to obtain correct and powerful pooled pvalues for significance testing of categorical variable as a whole with the MPR, the outcome should be omitted from the imputation model. Note that omitting the outcome for imputation may have a robustness advantage, because such imputation does not assume a specific model for the relation between outcome and covariates. In MPR, the hypothesis test results are directly pooled; the pooling parameters are the pvalues. In this case, using the outcome in the imputation model would lead to overoptimistic hypothesis test results, as was shown in the simulation. What are the practical consequences of our results? We suggest the following guideline: If the continuous variable(s) are most important: use one of the available pooling after outcomebased imputation methods. If the categorical variable(s) are of primary interest: use MPR with outcomeexcluded imputation. In both cases, both procedures render valid results for the other variables as well (in terms of type I error control), but may lack power. Hence, if pvalues for the other type of variables are just above 0.05, we recommend applying the alternative procedure to gain power. This comes at some computational cost, but, the MPR rule is very easy to apply in any software package that can perform MI and therefore timesaving in itself.
The usability of the pooling methods depends largely on their availability in statistical software. Software packages vary in methodology to pool parameters after MI. For example, In Stata and Mplus the multiple parameter pooling method (CHI pooling) can be used [17, 18]. There is separate SAS addon code available for CHI pooling and a translated version was developed for R [19]. In Mplus, SAS and R the Meng and Rubin test (MR pooling) is available [20]. For R this procedure is available in the MICE package [9, 20]. The pooled sampling variance method is available in Mplus, SAS and R. The strength of the MPR rule is that this rule can easily be applied posterior to MI in any software package. This is a large advantage for the many researchers that are most familiar with the use of statistical software package SPSS. These researchers do not have to switch to other software programs for the MI procedure in order to pool pvalues of categorical variables.
The parameters that are pooled with RR or in MR, CHI, and VAR pooling follow a normal distribution. To pool these parameters, the mean is used. The parameters that are pooled with MPR are pvalues, which do not follow a normal distribution. For that reason, it is warranted to pool using the median instead of the mean. As described in the paper by Marshall et al. [4], for other parameters, such as the proportion of variance explained or discrimination indices, the median may be a good summary estimator after MI. In future research, the application of MPR can be explored in many other situations where there is not yet a widely available pooling method at hand. Examples are nonparametric testing of variables after MI such as the pooled pvalues for spearman rho correlation coefficients. But also to pool pvalues from the Ftests of an analysis of variance (ANOVA). Van Ginkel and Kroonenberg developed a method to pool the Ftests of an ANOVA, but this procedure is rather complicated and not available in all software packages [21]. Also situations where likelihood ratio test statistics have to be pooled, i.e. comparing multilevel models in multiply imputed datasets, may benefit from the application of a pooling procedure such as MPR.
Conclusions
In conclusion, the MPR is an attractive rule for statistical inference of categorical variables with more than two levels because it has at least equal power as the multi parameter tests that are currently used but is much easier to apply in any software package.
Abbreviations
 ANOVA:

Analysis of variance
 BMI:

Body mass index
 CHI:

Chisquare test with multiple degrees of freedom
 MI:

Multiple imputation
 MICE:

Multiple imputation by chained equations
 MPR:

Median P rule
 MPR_{out} :

Median P rule with outcome excluded from imputation model
 MR:

Meng and Rubin pooling
 RR:

Rubin’s Rules
 VAR:

Pooled sampling variance method
References
 1.
Schafer JL, Graham JW. Missing data: our view of the state of the art. Psychol Methods. 2002;7:147–77. doi:10.1037//1082989X.7.2.147.
 2.
Eekhout I, de Vet HCW, Twisk JWR, Brand JPL, De Boer MR, Heymans MW. Missing data in a multiitem instrument were best handled by multiple imputation at the item score level. J Clin Epidemiol. 2014;67:335–42. doi:10.1016/j.jclinepi.2013.09.009.
 3.
Rubin DB. Inference and missing data. Biometrika. 1976;63:581–92.
 4.
Marshall A, Altman DG, Holder RL, Royston P. Combining estimates of interest in prognostic modelling studies after multiple imputation: current practice and guidelines. BMC Med Res Methodol. 2009;9:57. doi:10.1186/14712288957.
 5.
Li KH, Meng XL, Raghunathan T, Rubin DB. Significance levels from repeated pvalues with multiplyimputed data. Statitsica Sin. 1991;1:65–92.
 6.
Enders CK. Applied missing data analysis. New York: The Guilford Press; 2010.
 7.
Meng XL, Rubin DB. Performing likelihood ratio tests with multiplyimputed data sets. Biometrika. 1992;79:103–11.
 8.
van de Wiel MA, Berkhof J, van Wieringen WN. Testing the prediction error difference between 2 predictors. Biostatistics. 2009;10:550–60. doi:10.1093/biostatistics/kxp011.
 9.
van Buuren S, GroothuisOudshoorn K. MICE: multivariate imputation by chained equations in R. J Stat Softw. 2009;45:1–67.
 10.
White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30:377–99. doi:10.1002/sim.4067.
 11.
Rubin DB. Multiple imputation for nonresponse in surveys. New York: John Wiley and Sons; 1987.
 12.
Enders CK. Analyzing longitudinal data with missing values. Rehabil Psychol. 2011;56:267–88.
 13.
Li KH, Raghunathan T, Rubin DB. Largesample significance levels from multiply imputed data using momentbased statitsics and an F reference distribution. J Am Stat Assoc. 1991;86:1065–73.
 14.
Moons KGM, Donders RART, Stijnen T, Harrell FE Jr, Harrell FE. Using the outcome for imputation of missing predictor values was preferred. JClinEpidemiol. 2006;59:1092–101. doi:10.1016/j.jclinepi.2006.01.009.
 15.
Heymans MW, de Vet HCW, Bongers PM, Knol DL, Koes BW, van Mechelen W. The effectiveness of highintensity versus lowintensity back schools in an occupational setting: a pragmatic randomized controlled trial. Spine (Phila Pa 1976). 2006;31:1075–82. doi:10.1097/01.brs.0000216443.46783.4d.
 16.
Moons KG, Donders RA, Stijnen T, Harrell FE Jr. Using the outcome for imputation of missing predictor values was preferred. J Clin Epidemiol. 2006;59:1092–101. doi:10.1016/j.jclinepi.2006.01.009.
 17.
StataCorp. 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp LP.
 18.
Muthén LK, Muthén BO. Mplus User’s Guide. Seventh ed. Muthén & Muthén: Los Angeles; 2012.
 19.
Schoemann A, Mille P, Rhemtulla M, Pornprasertmanit S, Enders CK. Combining likelihood ratio ChiSquare statistics from a multiple imputation analysis in R 2012.
 20.
Mistler SA. A SAS macro for Computing Pooled Likelihood Ratio Tests with Multiply Imputed Data, in Proceedings of the SAS Global Forum 2013, San Francisco, California: Contributed Paper (Statistics and Data Analysis). 2013;4402013.
 21.
van Ginkel JR, Kroonenberg PM. Analysis of variance of multiply imputed data. Multivariate Behav Res. 2014;49:78–91. doi:10.1080/00273171.2013.855890.
Acknowledgements
Not applicable
Funding
The research was funded by the VU University medical center in Amsterdam. The funding body had no role in the design of the study and collection, analysis and interpretation of data and in writing the manuscript.
Availability of data and materials
The datasets used, generated and analyzed during the current study are available form the corresponding authors on reasonable request.
Author information
Affiliations
Contributions
All authors contributed equally to the design of the study, interpretation of the results and writing the manuscript. IE conducted the simulation study and the analyses on the example data. All authors read and approved the final manuscript.
Corresponding author
Correspondence to Iris Eekhout.
Ethics declarations
Ethics approval and consent to participate
The RCT study used for the databased simulation was approved by the Medical Ethics Committee of the VU University medical center, Amsterdam.
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1:
Formulas of the different multivariate pooling methods. (DOCX 85 kb)
Additional file 2:
Full overview of simulation results. (PDF 81 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Eekhout, I., van de Wiel, M.A. & Heymans, M.W. Methods for significance testing of categorical covariates in logistic regression models after multiple imputation: power and applicability analysis. BMC Med Res Methodol 17, 129 (2017) doi:10.1186/s1287401704047
Received
Accepted
Published
DOI
Keywords
 Multiple imputation
 Pooling
 Categorical covariates
 Significance test
 Logistic regression
 Simulation study