 Research
 Open Access
 Published:
A simulation study for evaluating the performance of clustering measures in multilevel logistic regression
BMC Medical Research Methodology volume 21, Article number: 245 (2021)
Abstract
Background
Multilevel logistic regression models are widely used in health sciences research to account for clustering in multilevel data when estimating effects on subject binary outcomes of individuallevel and clusterlevel covariates. Several measures for quantifying betweencluster heterogeneity have been proposed. This study compared the performance of betweencluster variance based heterogeneity measures (the Intraclass Correlation Coefficient (ICC) and the Median Odds Ratio (MOR)), and clusterlevel covariate based heterogeneity measures (the 80% Interval Odds Ratio (IOR80) and the Sorting Out Index (SOI)).
Methods
We used several simulation datasets of a twolevel logistic regression model to assess the performance of the four clustering measures for a multilevel logistic regression model. We also empirically compared the four measures of cluster variation with an analysis of childhood anemia to investigate the importance of unexplained heterogeneity between communities and community geographic type (rural vs urban) effect in Malawi.
Results
Our findings showed that the estimates of SOI and ICC were generally unbiased with at least 10 clusters and a cluster size of at least 20. On the other hand, estimates of MOR and IOR80 were less accurate with 50 or fewer clusters regardless of the cluster size. The performance of the four clustering measures improved with increased clusters and cluster size at all cluster variances. In the analysis of childhood anemia, the estimate of the betweencommunity variance was 0.455, and the effect of community geographic type (rural vs urban) had an odds ratio (OR)=1.21 (95% CI: 0.97, 1.52). The resulting estimates of ICC, MOR, IOR80 and SOI were 0.122 (indicative of low homogeneity of childhood anemia in the same community); 1.898 (indicative of large unexplained heterogeneity); 0.3453.978 and 56.7% (implying that the between community heterogeneity was more significant in explaining the variations in childhood anemia than the estimated effect of community geographic type (rural vs urban)), respectively.
Conclusion
At least 300 clusters with sizes of at least 50 would be adequate to estimate the strength of clustering in multilevel logistic regression with negligible bias. We recommend using the SOI to assess unexplained heterogeneity between clusters when the interest also involves the effect of clusterlevel covariates, otherwise, the usual intracluster correlation coefficient would suffice in multilevel logistic regression analyses.
Background
Biomedical research studies often collect binary outcome data that have a multilevel structure. For example, in estimating the efficacy of a new drug on the treatment of patients with a viral infection (recovered or not), patients could be nested within the admitting hospitals. Similarly, for a public health specialist assessing the effect of a feeding intervention on the growth of a child (stunted or not), children could be nested within communities which are also nested within districts. In both examples, patients or children (level1 units) may have little variation in background characteristics within each hospital or community (level2 units). On the other hand, hospitals or communities may have patients or children drawn from a wide range of background characteristics, which may affect the withinhospital or withincommunity homogeneity. A standard logistic regression model, which assumes that the individual outcomes are independent will not consider the hierarchical structure of the data within hospitals or communities, and will consequently fail to distinguish between these two manifestations in each example. Ignoring the dependence between the outcomes may result in an exaggerated number of independent patients or children within hospitals or communities. For the strong homogeneity between patients in hospitals or children in communities, the effect would be an underestimation of the standard errors for the covariate effects, which may lead to incomplete and misleading conclusions on the association between the treatment and the viral infection or the feeding intervention and child growth [1–3].
Multilevel logistic regression models [4], also called hierarchical logistic models [5], are appropriate statistical techniques for analyzing clustered binary outcome data. The use of multilevel models for the analysis of multilevel data structures has exponentiated since the 1990s due to the rise in computational power [2]. The analysis of multilevel data is undertaken mostly to serve two purposes: to provide correct inferences by accounting for the variation that may exist between level2 units, such as hospitals or communities, and to assess the effects of level2 risk factors and the unexplained betweencluster (level2 unit) variation in explaining differences between individuals’ (level1 units) proneness to the binary outcome [1, 6, 7]. Several techniques in multilevel logistic regression have been proposed to explore whether the individual binary outcomes are explained by clusterlevel covariates and the unexplained betweencluster heterogeneity. Because of its simplicity in interpretation, especially in linear mixed models, the Intraclass correlation coefficient (ICC) (also known as the variance partition coefficient (VPC)) is commonly used to measure the unobserved heterogeneity between clusters in multilevel models [8, 9]. In multilevel logistic regression, the ICC loses its natural interpretation which is easily understood and interpreted in the linear mixed model. Thus, several approaches have been developed to provide interpretable and meaningful information regarding the betweencluster heterogeneity and the effect of clusterlevel covariates. These methods include the Median Odds Ratio (MOR), the 80% Interval Odds Ratio (IOR80) and the Sorting Out Index (SOI) [2, 10, 11]. Though they are rarely used, they have nonetheless offered explanations for the unobserved variations between clusters and the contribution of clusterlevel predictors on the individual binary outcomes [3, 12].
This paper assessed the performance of the ICC, MOR, IOR80 and SOI in measuring the importance of clusterlevel covariates and unexplained betweencluster heterogeneity in the analysis of a twolevel logistic regression model. We compared the performance using simulation studies as well as an empirical application to childhood anemia data from Malawi.
Twolevel logistic regression model
Suppose there are H clusters and each cluster has N_{h} individuals. Let Y_{hi} be the binary outcome taking a value of 1 or 0 for each individual i in cluster h. Also, let us assume that the probability of Y_{hi}=1 is π_{hi}, and π_{hi}/1−π_{hi} is the odds of Y_{hi}=1,(i=1,2,…,N_{h};h=1,2,…,H). For each individual hi, there is a vector X_{hi} containing values of p predictors, including clusterlevel covariates. The equation of a twolevel logistic regression model is given as
Equation (1) is the conditional logistic model where π_{hi} is the probability that each individual i in cluster h takes on the value Y_{hi}=1 and depends on the value of the cluster random effect, u_{h}, which captures the effect of unmeasured clusterlevel variability that affects the outcome propensity. They are usually assumed to be normally distributed with a mean of 0 and variance of \(\sigma ^{2}_{h}\), and are uncorrelated with the individual and clusterlevel predictor variables. In this way, heterogeneity of the binary outcome (Y_{hi}=1) propensity between the clusters would have been accounted for. For example, increasing values of u_{h} indicate that the associated clusters have higher propensities on the binary outcome (Y_{hi}=1) and greater heterogeneity between the outcomes is implied by higher values of \(\sigma ^{2}_{h}\).
Intraclass correlation coefficient (ICC)
In biomedical research studies, the intraclass correlation coefficient (ICC) statistic is used in multilevel analysis to measure the extent of withincluster homogeneity, which is caused by the multilevel structure of the data [13, 14]. The statistic is obtained by calculating the ratio of the amount of betweencluster variance to the sum of between and withincluster error variances, and it is computed as follows:
where \(\sigma ^{2}_{u}\) and \(\sigma ^{2}_{e}\) are the variances of the betweencluster and the withincluster errors, respectively. Since adding the two variances give the total variance of the outcome variable, ρ is often interpreted as the proportion of the variance explained by clustering, or as a measure of withincluster homogeneity. If the value of ICC is 0 then the observed outcomes are completely independent, and if the value is 1 then the outcomes are completely dependent on the cluster they belong to. Thus, the ICC ranges from 0 to 1, and using cutoff points in Koo et al. [13] and Merlo et al. [15], values of ICC of less than 0.50, between 0.50 and 0.75, between 0.76 and 0.90 and greater than 0.90 indicate low, moderate, high and very high correlations, respectively. For the linear mixed model, both the random effect and individual error terms are on the same linear scale, which makes it easier to compute the total variance in the outcome variable. However, for the multilevel logistic regression model, the clusterlevel variance is on the logistic scale which cannot be directly understood in terms of variations between clusters in the prevalence of the binary outcome. Furthermore, the individuallevel variance is taken from an underlying standard logistic distribution. The value of the variance for this underlying distribution is \(\sigma ^{2}_{e}=\pi ^{2}/3=3.29\). Thus, as pointed out in Merlo et al. [12] and Sanagou et al. [3], the usefulness of ICC is somehow limited, largely due to its dependence on linear mixed model concepts regarding partitioning of the total variance, its association with the prevalence of the outcome variable, and the different scales between clusterlevel and individuallevel error variances. Thus, for public health and epidemiological usages, explaining the partitioning of total variance and ICC for dichotomous outcomes is most challenging.
The median odds ratio (MOR)
As an alternative to the ICCbased clustering measure in multilevel logistic regression, Larsen and Merlo [10] proposed expressing the betweencluster variance on the odds ratio (OR) scale, which could then be easily interpreted as it is on the same scale on which the effects of predictor variables are measured. For the calculation of MOR, one considers a set of all odds ratios that could be obtained by comparing two individuals with identical covariates but from two different clusters, where one has higher random effect and the other with a lower random effect. The two random effect values are ordered so that the odds ratio is always at least one. The median of these odds ratios provides the Median Odds Ratio (MOR) statistic and, assuming that the random effects are normally distributed, the MOR is given as [10]:
where Φ^{−1}(0.75) is the 75th percentile of a standard normal distribution. If \(\sigma ^{2}_{u}=0\), then MOR=1, which is indicative of the absence of betweencluster heterogeneity, while if \(\sigma ^{2}_{u}\) is large (i.e. considerable betweencluster heterogeneity), the MOR will be much larger than 1 [3, 16]. Thus, the MOR measures heterogeneity between individuals belonging to different clusters.
Interval odds ratio  80% (IOR80)
The interval odds ratio (IOR), as developed in Larsen and Merlo [10] assesses the impact of clusterlevel predictor variables on the individual binary outcomes. One starts by considering two different clusters that differ by one unit on a given clusterlevel predictor variable. Two individuals with the same values for the remaining clusterlevel predictor variables and all the individuallevel covariates are chosen, one from each of the two clusters. Then the odds ratio (OR) for the predictor is computed, which often measures the effect of moving to a higher value of the given clusterlevel predictor (for a binary clusterlevel predictor variable, the OR is just the odds of it being 1 versus 0). Considering all possible pairs of individuals and their ORs, the 80% interval around the median of the distribution of the OR values provides the 80% interval odds ratio (IOR80). Under the assumption of a normally distributed random effect, the IOR80 has a lower limit of \(\text {IOR}_{L}=\exp (\alpha +\sqrt {(2\times \sigma ^{2}_{u})}\times ( 1.2816))\) and an upper limit of \(\text {IOR}_{U}=\exp (\alpha +\sqrt {(2\times \sigma ^{2}_{u})}\times (+ 1.2816))\) [10, 17], and values 1.2816 and +1.2816 represent the 10th and 90th percentiles of the standard normal distribution, respectively. Thus, the limits of the IOR80 are given as:
where α is the regression coefficient for the clusterlevel predictor of interest. If the betweencluster variance is small, then the IOR80% interval will be narrow, and if the betweencluster variance is large, then the IOR80% interval will be wider. IOR80% will contain a 1 if the cluster heterogeneity is large compared to the effect of the clusterlevel predictor. If the interval will not contain 1, then the effect of the clusterlevel predictor (positively or negatively) is larger compared to the unexplained betweencluster variation [10]. Thus, the addition of the clusterlevel predictor effect helps to compare the importance of betweencluster heterogeneity and clusterlevel predictors in understanding the differences in individual proneness to the outcome, Y_{hi}=1. However, the choice of the 80% interval width is rather arbitrary. Longer or shorter interval widths may or may not contain a 1 when the 80% width actually contains a 1.
Sorting out index (SOI)
To remedy the problem of IOR depending on the choice of the interval width, Chaix et al. [18] suggested using the percentage in the distribution of the odds ratios as described under the IOR for which the value is greater than 1. Thus, if the cluster residual variance is high, and the effect of the clusterlevel predictor is not able to distinguish highrisk clusters from lowrisk clusters, then the odds ratios between clusters for the cluster effect would be greater than 1 in half of the cases. On the other hand, if the clusterlevel predictor has a huge positive effect compared to the betweencluster heterogeneity, then the odds ratios would be greater than 1 in almost all the pairwise cases of individuals. The percentage range of 50100% is often used as a sorting out index (SOI) to assess and measure the extent to which a given clusterlevel covariate is important compared to the betweencluster heterogeneity in explaining the individual proneness to Y_{hi}=1.
The clusterlevel covariate could be assumed to be negatively associated with the outcomes, in which case one would formulate the SOI differently. It will still be in half of the cases if it is very weak, otherwise it will be in none of the cases where the odds ratio is greater than 1; then the SOI will range between 0 to 50%. Thus, the percentage range of the SOI can be between 0% and 100% [11, 19]. Assuming the random effect is normally distributed, the sorting out index (SOI) is calculated as:
The value of Φ for \(\left (\frac {\alpha }{\sqrt {2\sigma ^{2}_{u}}}\right)\) can be evaluated from a standard normal distribution table and is expressed as a percentage. This information indicates the degree to which the clusterlevel predictor is of importance as compared to the unexplained betweencluster heterogeneity. An SOI of near 0% or 100% implies an absence of betweencluster variability and a high negative or positive effect of the clusterlevel predictor, respectively. The SOI of 50% implies that the effect of clusterlevel predictor is low [19], as compared to the unexplained betweencluster heterogeneity.
Simulation protocol
We compared the performance of ICC, MOR, IOR80, and SOI on simulated twolevel binary datasets with individual and clusterlevel covariates. We then fitted twolevel logistic regression models to the datasets and compared the resulting estimates to the true values by the root mean square errors.
For the simulation, the response variable, Y_{hi}, was distributed as a Bernoulli random variable. The following logit model on the probability, π_{hi}=Pr(Y_{hi}=1), was used to simulate the binary responses:
We assumed the clusterspecific random effect, u_{h}, followed a normal distribution with a mean of 0 and a variance of \(\sigma ^{2}_{u}\) (i.e. \(u_{h}\sim N(0, \sigma ^{2}_{u})\)). The fixed effect parameters were fixed at β_{0}=0.5,β_{1}=−1.5,β_{2}=0.3,α=−0.3, and both level1 covariates X_{hp}∼N(0,1) for p=1,2, and level2 covariate X_{h}∼Bernoulli(0.5). For example, to generate 10 clusters with each cluster having 5 individuals with the outcome Y_{hi}=1 or 0, the following steps were implemented;

a)
We drew 10 random effects, u_{h}, for the 10 clusters distributed from a normal, \(u_{h}\sim N(0, \sigma ^{2}_{u})\), for example a value of \(\sigma ^{2}_{u}=0.2.\)

b)
Take the first value of the generated random effect, say u_{h}=1.78.
Repeat the following steps (cf) 5 times for cluster 1 with the random effect value of 1.78.

c)
Draw X_{hi1} from a N(0,1) and X_{hi2} from a N(0,1) and X_{h} from Bernoulli(0.5).

d)
Substitute the generated values into the equation: logit(π_{hi})=0.5−1.5X_{hi1}+0.3X_{hi2}−0.3X_{h}+1.78

e)
We then solve for \(\pi _{{hi}}= \frac {1}{1+ \exp {((\beta _{0}} + \beta _{1} X_{hi1} + \beta _{2} X_{hi2} + \alpha X_{h}+ u_{h}))}\)

f)
We then draw Y_{hi} from a Bernoulli with parameter π_{hi}, which results in either 1 or 0 as a binary outcome for subject 1 in the cluster 1.

g)
Return to step b) and pick the next value, for example 0.568, as the random effect value for cluster 2. This is repeated 10 times to generate 10 clusters with known random effect values, each cluster having 5 individuals.
Here, we fixed the values for \(\sigma ^{2}_{u}\) at 0.2, 0.5, 1.0 and 1.5. The resulting true values of ICC, MOR, SOI and IOR80 are shown in Table 1.
The process was repeated for the varied number of clusters (i.e., H = 10, 50, 100, 300, 500 clusters), number of observations within each cluster (i.e., N_{h} = 5, 20, 50, 100, 250). For each combination of the number of clusters, cluster size, and between cluster variance, 1000 datasets were simulated. The root mean square error (RMSE) \(\sqrt {\frac {(E_{i}O_{i})^{2}}{N}}\), where E_{i} was the estimated value and O_{i} the true value as shown in Table 1, were computed to determine their accuracies. We used the open software R (R Core Team, 2020) and packages rms (regression modeling strategies) [20], lmerTest and pbkrtest in simulating and analyzing the data.
Simulation results
Tables 2 and 3 present the accuracies of estimates of the fixed effects and measures of heterogeneity for the twolevel logistic regression model for a varied number of clusters, cluster sizes and cluster variances. Tables 4 and 5 present results of estimated values of the fixed effects and measures of heterogeneity for the 2level logistic regression model, under various numbers of clusters, cluster sizes, and cluster variances.
Accuracy of estimated parameters
We observe that the estimated parameters for the fixed effects were accurate for all clusterlevels and cluster variances, particularly for large clusters and cluster sizes. Lower root mean square errors were particularly observed for larger clusters, providing evidence of unbiasedness for the estimates \(\hat {\beta }\)’s and \(\hat {\alpha }\) for increased clusters and cluster sizes as shown in Tables 2 and 3. The RMSEs for \(\hat {\beta }_{0}\) were slightly biased for 10 clusters with a cluster size of 5. However, the estimates were unbiased for at least 10 clusters with a cluster size of at least 20 for all clusterlevel variances. At a cluster variance of 1.0, the estimates for \(\hat {\beta }_{0}\) were unbiased for at least 100 clusters with a cluster size of at least 5. Lower RMSEs for \(\hat {\beta _{1}}\) were observed for at least 10 clusters with a cluster size of at least 5 for all cluster variances. Bias for \(\hat {\beta _{1}}\) was negligible for 100 clusters with cluster size of at least 50 at a cluster variance of 0.5. At a cluster variance of 0.2, \(\hat {\alpha }\) was slightly biased for 10 clusters with a cluster size of 5. However, we observe unbiased estimates of α for 100 clusters with cluster sizes of 50, and 500 clusters with a cluster size of at least 5. The bias of \(\hat {\alpha }\) improved with the increase in clusters and cluster size for all cluster variances.
Performance of estimated heterogeneity measures
Intraclass correlation coefficient (ICC)
Tables 2 and 3 show that the estimates for the ICC were less biased when the cluster variance was 0.2 and the estimates of the ICC were unbiased with 100 clusters and clusters sizes of 20 or more. At the cluster variance of 0.5, the ICC was estimated with less bias for at least 100 clusters of sizes 50 or larger. At level2 variance of 1.0, the ICC was slightly biased for 10 clusters with cluster size of 5. However, the bias for estimating the ICC decreased for at least 50 clusters with cluster size of at least 20. At a level2 variance of 1.5, the lowest RMSEs were observed for the estimation of ICC with at least 10 clusters each of cluster size of at least 20. Increasing the number of clusters (at least 100) and cluster size (at least 20) gave negligible bias for the estimation of ICC as its RMSEs decreased to almost zero, for all level2 variances.
Median odds ratio (MOR)
As observed from Tables 2 and 3, the estimates of the MOR were largely biased for 50 clusters with cluster size of 5, for all cluster variances. at a cluster variance of 0.2, the estimates of the MOR were less biased for at least 50 clusters with cluster sizes of at least 20. The estimates of the MOR improved when cluster sizes were 50 or more for all clusterlevel variances. The lowest RMSEs were observed for at least 300 clusters and a cluster size of at least 50 in estimating the MOR.
80% interval odds ratio (IOR80)
The estimates for the IOR80 were less accurate for small clusters and cluster sizes when the cluster variance was large as observed from Tables 2 and 3. At a cluster variance of 0.2, we observe low RMSEs when estimating the IOR80 lower and upper limits for at least 10 clusters with a cluster size of at least 20. At a cluster variance of 0.5, the estimates of the IOR80 lower limit were accurate for at least 10 clusters with a cluster size of at least 20. However, the estimates for the IOR80 upper limit were largely biased for at least 10 clusters with a cluster size of at least 5 for cluster variances of 1.0 and 1.5. On the contrary, the estimates for the IOR80 lower limit at these cluster variances (1.0 and 1.5) were accurate for at least 10 clusters with a cluster size of at least 20. The accuracy of the IOR80 estimates were less biased with increased clusters and cluster sizes. At a cluster variance 0.2, unbiased estimates of the IOR80 lower and upper limits were observed for 100 clusters with cluster size of at least 50. Overall, the bias for the estimates of the I0R80 decreased for at least 300 clusters with a cluster size of at least 100 at all cluster variances.
Sorting out index (SOI)
As seen from Tables 2 and 5, the SOI estimates were biased for 10, 50, and 100 clusters with a cluster size of 5. However, the estimates of SOI were less biased for at least 10 clusters with cluster sizes of at least 20 at cluster variances of 0.2 and 0.5. Furthermore, the estimates for SOI were slightly biased for 10 clusters with a cluster size of 5. The bias in estimating the SOI decreased with increased clusters (at least 300) and a cluster size (at least 50) at clusterlevel variances of 0.2 and 0.5. In addition, for large cluster variances of 1.0 and 1.5, the estimates of the SOI were unbiased with increased clusters (at least 100) and cluster size (at least 50).
Application to childhood anemia in Malawi
For the application, we analyzed childhood anemia among underfive children in Malawi using the 201516 Malawi Demographic and Health Survey (MDHS) data by fitting a twolevel logistic regression model taking the community as a leveltwo unit. Childhood anemia is a major public health concern in SubSaharan Africa (SSA), particularly in Malawi [21]. We categorized anemia among children aged 6 to 59 months who were born five years before the survey as: no anemia = 0 and had anemia = 1.
There were 4676 children aged 6 to 59 months distributed across 850 enumeration areas (EA) which we regarded as communities. The overall prevalence of anemia was 63.1% (95% CI: 61.7, 64.5), with an average prevalence of childhood anemia of 62.2%, with a range of 0.0% to 100% and mode of 100%. The median anemia prevalence across the communities was 66.7%. The average number of children per community was 5.5, with a range of 1 to 15 and mode of 6. The median number of children per community was 5.
There are several factors that have been found to be associated with childhood anemia. These include individuallevel factors: age, gender of a child, mother’s age, birth rank, wealth index, mother’s education level, father’s education level, household size, household hunger status, sanitation services, water and electricity supply, and communitylevel factors: community geographic type and community development index [21–25]. For our application, we used mother’s education, mother’s age, child gender, household wealth quantile, and community geographic type (urban/rural). For the unadjusted model, the estimate of the betweencommunity variance was 0.497, translating to an ICC of 0.131 and an MOR of 1.954. Adjusting for the childlevel and the communitylevel covariates had the following odds ratios on child anemia: being female (OR= 0.92, 95% CI: 0.80, 1.04), being in a household with medium (OR= 0.71, 95% CI: 0.59, 0.84) and rich (OR= 0.78, 95% CI: 0.66, 0.93) wealth, having mothers with primary (OR= 0.69, 95% CI: 0.56, 0.86) and postsecondary (OR= 0.64, 95% CI: 0.49, 0.82) education, and residing in a rural community (OR= 1.21, 95% CI: 0.97, 1.52). For the covariateadjusted model, the estimate of the betweencommunity variance was 0.455. This translates to estimated values of ICC and MOR to be 0.122 and 1.898, respectively. The estimate of the ICC show that the unexplained heterogeneity between communities (i.e. low homogeneity in childhood anemia in a community) was low. While that of the MOR shows a high level of unexplained community heterogeneity (MOR »1). The estimates of the SOI was 0.567 (translated as 56.7%) and of the IOR80 was (0.345, 3.978), implying that the unexplained heterogeneity between communities was more significant in explaining a child’s propensity to be anemic than the effect of the community geographic type (rural vs urban).
Discussion
Multilevel logistic regression models, which recognize the fact that level1 units (individuals) are nested within higherlevel units (clusters) when estimating the effect of covariates at different levels on individual binary outcomes, are widely implemented in most standard statistical software packages. They measure and evaluate the relative variation in the outcome measures, between individuals within and between clusters. However, measuring residual clusterlevel variation and covariate effect heterogeneity or homogeneity (clustering) of binary outcomes within highlevel clusters is more challenging in multilevel logistic regression than for linear mixed models. We set out to compare the performance of four measures of clustering in a twolevel nested logistic regression model through a simulation study. The four measures were ICC, MOR, SOI and IOR80. We applied the four measures to an analysis of childhood anemia in Malawi using a twolevel logistic regression model, where the level2 units were communities.
The accuracy of the estimated coefficients for the twolevel logistic regression model improved for 100 to 500 clusters with cluster sizes of 50 to 250 for all cluster variances. These findings are similar to results from previous studies [26, 27], which showed that as the number of clusters and cluster sizes increased, the accuracy of the regression coefficients had a negligible bias. The estimates of ICC were more accurate than MOR, SOI and IOR80 when the cluster number was small (at least 10) and when the cluster size was small (at least 20), and the bias for the ICC was negligible when the number of clusters and the cluster sizes increased. This is similar to a study by Moineddin et al. [26] which found that increasing the number of clusters and cluster sizes improved the convergence rate for the ICC.
The estimates of the SOI were more accurate than the estimates for the MOR and the IOR80 for smaller number of clusters (at least 10) and a cluster size of at least 20 regardless of the cluster variance. However, the estimates of all the measures were more accurate when the number of clusters increased (from 100 to 500) regardless of cluster size and cluster variance. The estimates for MOR and IOR80 performed better for large clusters (at least 300) and large cluster size (at least 20) at larger cluster variance (1.0 and 1.5). Overall the ICC and SOI estimates were better than the estimates of the MOR and the IOR80 and the estimation of the heterogeneity between clusters using ICC and SOI was more accurate. However, the ICCs usefulness is limited as it does not account for covariates at the cluster level. Hence, the SOI would be the best statistic to measure the strength of clustering as compared to the ICC, MOR and IOR80, especially when between clusterlevel covariate heterogeneity is also of interest.
In the application to childhood anemia in Malawi, accounting for the purported risk factors, the estimated values of the ICC, and MOR were 0.122 and 1.898, respectively, which were indicative of low similarity between the status of anemia in children within a community, and high variability in unexplained heterogeneity between the communities. The estimates of SOI and IOR80 were 56.7% and (0.345, 3.978), respectively, which showed that unexplained heterogeneity between communities was of great significance in the understanding of anemia in the children than the child and communitylevel predictors, namely mother’s education, mother’s age, child sex, community geographic type and household wealth quantile, used in the model, and communitylevel geographic type (rural/urban). [10]. Mother’s age, wealth quintile and mother’s education were significantly associated with childhood anemia, similar to findings from previous studies [21, 23, 24]. However, the effect of community geographic type on childhood anemia was not significant. With 850 communities, an average of 5 children per community and the simulation study, we recommend using the ICC and SOI to measure the strength of heterogeneity between the communities. However, since the ICC does not take into account the communitylevel covariate (community geographic type), we recommend using the SOI to measure the strength of variation between communities for this application.
Conclusion
In conclusion, measures of the betweencluster heterogeneity and effects of clusterlevel covariates in multilevel logistic regression models would be estimated with negligible bias when the data has at least 300 clusters with a cluster size of at least 50. The estimation of the intraclass correlation coefficient and the sorting out index are accurate with 10 clusters and a cluster size of 20. We recommend using the sorting out index to assess unexplained heterogeneity between clusters as well as the effect of clusterlevel covariates. The usual intracluster correlation coefficient would suffice when only quantification of the betweencluster variation is the purpose of the multilevel logistic regression analysis.
Availability of data and materials
The dataset used in this study are available from the DHS website https://dhsprogram.com/Data/ upon request from the MEASURE DHS program team.
References
 1
Guo G, Zhao H. Multilevel modeling for binary data. Annu Rev Sociol. 2000; 26(1):441–62.
 2
Austin PC, Merlo J. Intermediate and advanced topics in multilevel logistic regression analysis. Stat Med. 2017; 36(20):3257–77. https://0doiorg.brum.beds.ac.uk/10.1002/sim.7336.
 3
Sanagou M, Wolfe R, Forbes A, Reid CM. Hospitallevel associations with 30day patient mortality after cardiac surgery: A tutorial on the application and interpretation of marginal and multilevel logistic regression. BMC Med Res Methodol. 2012; 12:28. https://0doiorg.brum.beds.ac.uk/10.1186/147122881228.
 4
Goldstein H, Vol. 922. Multilevel Statistical Models. Oxford: Wiley; 2011.
 5
Bryk AS, Raudenbush SW. Hierarchical Linear Models: Applications and Data Analysis Methods. New York: Sage Publications, Inc; 1992.
 6
Skrondal A, RabeHesketh S. Latent variable modelling: A survey. Scand J Stat. 2007; 34(4):712–45. https://0doiorg.brum.beds.ac.uk/10.1111/j.14679469.2007.00573.x.
 7
Snijders Roel TAB, Bosker RJ. Multilevel analysis: an introduction to basic and advanced multilevel modeling. London: Sage Publications; 1999.
 8
Pleil JD, Wallace MAG, Stiegel MA, Funk WE. Human biomarker interpretation: the importance of intraclass correlation coefficients (ICC) and their calculations based on mixed models, ANOVA, and variance estimates. J Toxicol Environ Health B. 2018; 21(3):161–80. https://0doiorg.brum.beds.ac.uk/10.1080/10937404.2018.1490128.
 9
Kul S, Vanhaecht K, Panella M. Intraclass correlation coefficients for cluster randomized trials in care pathways and usual care: Hospital treatment for heart failure. BMC Health Serv Res. 2014; 14(1):1–7. https://0doiorg.brum.beds.ac.uk/10.1186/147269631484.
 10
Larsen K, Merlo J. Appropriate assessment of neighborhood effects on individual health: Integrating random and fixed effects in multilevel logistic regression. Am J Epidemiol. 2005; 161(1):81–8. https://0doiorg.brum.beds.ac.uk/10.1093/aje/kwi017.
 11
Larsen K. New measures for understanding the multilevel logistic regression model. In: Bochum Workshop on “Statistische Methoden fur korrelierte Daten,” Bochum, Germany.2006. http://www.biometrie.uniheidelberg.de/statmethag/veranstaltungen/bochum06/Vortrag_Larsen.pdf.
 12
Merlo J, Chaix B, Yang M, Lynch J, Råstam L. A brief conceptual tutorial of multilevel analysis in social epidemiology: Linking the statistical concept of clustering to the idea of contextual phenomenon. J Epidemiol Community Health. 2005; 59(6):443–9. https://0doiorg.brum.beds.ac.uk/10.1136/jech.2004.023473.
 13
Koo TK, Li MY. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J Chiropr Med. 2016; 15(2):155–63. https://0doiorg.brum.beds.ac.uk/10.1016/j.jcm.2016.02.012.
 14
Thompson DM, Fernald DH, Mold JW. Intraclass correlation coefficients typical of clusterrandomized studies: Estimates from the robert wood johnson prescription for health projects. Ann Fam Med. 2012; 10(3):235–40. https://0doiorg.brum.beds.ac.uk/10.1370/afm.1347.
 15
Merlo J, Wagner P, Austin PC, Subramanian SV, Leckie G. General and specific contextual effects in multilevel regression analyses and their paradoxical relationship: A conceptual tutorial. SSM  Popul Health. 2018; 5:33–7. https://0doiorg.brum.beds.ac.uk/10.1016/j.ssmph.2018.05.006.
 16
Stapleton LM, McNeish DM, Yang JS. Multilevel and SingleLevel Models for Measured and Latent Variables When Data Are Clustered. Educ Psychol. 2016; 51(34):317–30. https://0doiorg.brum.beds.ac.uk/10.1080/00461520.2016.1207178.
 17
Larsen K, Petersen JH, BudtzJørgensen E, Endahl L. Interpreting parameters in the logistic regression model with random effects. Biometrics. 2000; 56(3):909–14.
 18
Chaix B, Merlo J, Gaignard J, Lithman T, Boalt A, Chauvin P. The social and spatial distribution of mental and behavioral disorders related to psychoactive substance use in the city of Malmö, Sweden, 2001. New York: Nova Science Publishers; 2005.
 19
Merlo J, Wagner P, Ghith N, Leckie G. An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PloS ONE. 2016; 11(4):0153778.
 20
Harrell F. Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. Chapter 5: Resampling, Validating, and Simplifying the Model. 2001; 3:88–103.
 21
Ngwira A, Kazembe L. Analysis of severity of childhood anemia in Malawi: a Bayesian ordered categories model. Open Access Med Stat. 2016; 6:9. https://0doiorg.brum.beds.ac.uk/10.2147/OAMS.S95159.
 22
NgnieTeta I, Receveur O, KuateDefo B. Risk factors for moderate to severe anemia among children in Benin and Mali: Insights from a multilevel analysis. Food Nutr Bull. 2007; 28(1):76–89. https://0doiorg.brum.beds.ac.uk/10.1177/156482650702800109.
 23
Adedokun ST, Adekanmbi VT, Uthman OA, Lilford RJ. Contextual factors associated with health care service utilization for children with acute childhood illnesses in Nigeria. PLoS ONE. 2017; 12(3):e0173578. https://0doiorg.brum.beds.ac.uk/10.1371/journal.pone.0173578.
 24
Ncogo P, RomayBarja M, Benito A, Aparicio P, Nseng G, Berzosa P, SantanaMorales MA, Riloha M, Valladares B, Herrador Z. Prevalence of anemia and associated factors in children living in urban and rural settings from Bata District, Equatorial Guinea, 2013. PLoS ONE. 2017; 12(5):1–14. https://0doiorg.brum.beds.ac.uk/10.1371/journal.pone.0176613.
 25
Legason ID, Atiku A, Ssenyonga R, OlupotOlupot P, Barugahare JB. Prevalence of Anaemia and Associated Risk Factors among Children in Northwestern Uganda: A Cross Sectional Study. BMC Hematology. 2017; 17(10):1–9. https://0doiorg.brum.beds.ac.uk/10.1186/s1287801700810.
 26
Moineddin R, Matheson FI, Glazier RH. A simulation study of sample size for multilevel logistic regression models. BMC Med Res Methodol. 2007; 7(1):34. https://0doiorg.brum.beds.ac.uk/10.1186/14712288734.
 27
Hox J. Multilevel Modeling: When and Why. 1998:147–54.
Acknowledgments
HT greatly thanks the Biostatistics Research Unit at the South African Medical Research Council (SAMRC) for hosting her. She also thanks the L’Oreal UNESCO for Women in Science for the support through the 2020 Young Talents award.
Funding
SOMM was funded by the South Africa Medical Research Council (SAMRC). The funders had no role in the design, analysis, interpretation or writing of the manuscript.
Author information
Affiliations
Contributions
NA carried out the simulation studies and wrote the initial draft of the paper. SOMM conceived the research ideas including the simulation plans and extensively revised the manuscript for intellectual content. HT helped with checking the R code for the simulation studies and the revisions of the manuscript. All authors have approved the final version for submission. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
For the Malawi Demographic and Health Survey (MDHS) data, ethical approval was not required as the National Statistics Office (NSO) of Malawi that conducted the survey was responsible for seeking ethical clearance on the questionnaires and protocol. The ethical clearance was granted by the Malawi Health Research Committee, Institutional Review Board of ICF Macro, Centre for Disease and Control (CDC) in Atlanta, GA, USA and Prevention IRB. Additionally, informed consent and anonymity do not apply to our study as these were done before and after data collection by the responsible implementing institution.
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Samuel O.M. Manda is the senior author.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Adam, N.S., Twabi, H.S. & Manda, S.O. A simulation study for evaluating the performance of clustering measures in multilevel logistic regression. BMC Med Res Methodol 21, 245 (2021). https://0doiorg.brum.beds.ac.uk/10.1186/s12874021014174
Received:
Accepted:
Published:
DOI: https://0doiorg.brum.beds.ac.uk/10.1186/s12874021014174
Keywords
 Heterogeneity measures
 Clusters
 Multilevel logistic regression
 Childhood anemia
 Binary outcomes