 Research Article
 Open Access
 Published:
Can the buck always be passed to the highest level of clustering?
BMC Medical Research Methodology volume 16, Article number: 29 (2016)
Abstract
Background
Clustering commonly affects the uncertainty of parameter estimates in epidemiological studies. Clusterrobust variance estimates (CRVE) are used to construct confidence intervals that account for singlelevel clustering, and are easily implemented in standard software. When data are clustered at more than one level (e.g. village and household) the level for the CRVE must be chosen. CRVE are consistent when used at the higher level of clustering (village), but since there are fewer clusters at the higher level, and consistency is an asymptotic property, there may be circumstances under which coverage is better from lower rather than higherlevel CRVE. Here we assess the relative importance of adjusting for clustering at the higher and lower level in a logistic regression model.
Methods
We performed a simulation study in which the coverage of 95 % confidence intervals was compared between adjustments at the higher and lower levels.
Results
Confidence intervals adjusted for the higher level of clustering had coverage close to 95 %, even when there were few clusters, provided that the intracluster correlation of the predictor was less than 0.5 for models with a single predictor and less than 0.2 for models with multiple predictors.
Conclusions
When there are multiple levels of clustering it is generally preferable to use confidence intervals that account for the highest level of clustering. This only fails if there are few clusters at this level and the intracluster correlation of the predictor is high.
Background
Observations are often grouped in assortative clusters, so that two observations from the same cluster tend to be more similar than two selected at random. For example, members of the same household might share genetic and environmental risk factors such that the presence of a disease in one member is predictive of that in others in the same household.
Clustering can influence the amount of uncertainty in parameter estimates. For the sample mean, the standard estimate of the variance must be inflated by a factor 1+ρ(n−1), where ρ is the intracluster correlation, which equals the ratio of the variance of cluster means to the total variance of the observations [1], and n is the number of clusters. For measures of association between an outcome (y) and predictor (x) the effect of clustering in the outcome is complicated by the distribution of x across clusters—i.e., the degree of clustering in x—and it may not always inflate the variance. In a linear regression model the variance of the regression coefficient associated with the predictor is increased by 1+(n−1)ρ _{ x } ρ _{ y } relative to the OLS estimate [2, 3]. Thus clustering has no effect when either ρ _{ x } or ρ _{ y } is zero and a large effect when both are close to one.
Generally, parameter estimates from generalised linear models, such as logistic regression, are consistent in the presence of clustering, provided that the relationship between the mean of the outcome and the predictor variables is correctly specified. But the standard variance estimates of the regression parameters that ignore clustering are not consistent, and therefore confidence intervals that are based on these variance estimates are incorrect [4]. Fortunately, it is possible to obtain consistent variance estimates for regression parameters using clusterrobust variance estimates (CRVE), which are consistent irrespective of the correlation structure within clusters, provided that observations between clusters are independent [4]. In particular, when there is more than one level of clustering (e.g., individuals clustered in households and households clustered in villages), then CRVE applied at the higher level are consistent, despite the correlation structure within the higherlevel clusters (villages) being complicated by correlations within the lowerlevel clusters (households). Thus a researcher who is faced with multiple levels of clustering can obtain consistent confidence intervals by using CRVE at the highest level of clustering: Angrist and Pischke refer to this as ‘passing the clustering buck’ to the higher level [5].
Consistency, however, guarantees lack of bias only asymptotically, i.e., for sufficiently large sample sizes. Unfortunately, CRVE are biased when there are few clusters. Furthermore, the bias is usually downward so that confidence intervals are too narrow [6]. There is therefore a tradeoff. At the lower level of clustering there will be more clusters, but observations from different clusters will be dependent. While, at the higher level, observations from different clusters are more likely to be independent but there will be fewer clusters and the CRVE will be biased.
In this study we explore this tradeoff in the context of logistic regression. We use a random effects (conditional) model to simulate binary data that are clustered at two levels, and fit a marginal model to these data, using CRVE to adjust for clustering at either the higher or the lower level. Before we present the simulation, we describe the relationship between marginal and conditional models, and discuss the intracluster correlation as a measure of the degree of clustering.
Methods
Marginal and conditional models
We model the relationship between a binary outcome and a set of binary predictors in the presence of nested clusters, where the disease and predictors can vary in prevalence between clusters. For example, we might want to predict the probability of a disease based on certain risk factors, and the disease and risk factors are known to cluster in households and villages. In this example, households are the lowerlevel clusters, and they are nested in villages because members of a household belong to the same village.
One approach used to account for clustering is to include random effects in the regression model. For example, we might model the effects of household and village as independent, normally distributed random variables z _{ jk } and u _{ k } and include these, together with the predictors x _{1},…,x _{ p }, in the model
where π _{ ijk } is the probability of disease in individual i from household j and village k.
We refer to this as the conditional model as the parameter estimates for the predictor variables are conditional on the village and household effects. The model can be fitted by integrating the likelihood over the distribution of the unobserved random effects of village and household, and then maximising this marginal likelihood. A drawback of this approach is that it is necessary to assume distributions for the random effects, and the parameter estimates can be sensitive to the choice of distribution [7].
Alternatively, we can fit a marginal, or population average, logistic regression model that ignores clustering
The parameters of this model can be estimated by fitting the model using maximum likelihood, ignoring the cluster effects. This is equivalent to solving a set of estimating equations (Eq. A2 in the Appendix) that have been derived by setting the derivative of the loglikelihood to zero. Each parameter estimates is consistent, provided that the relationship between the probability of disease and predictor variables is correctly specified, but the usual variance estimate based on the second derivative of the loglikelihood is not correct. For a single level of clustering, a cluster robust variance estimate (CRVE) can be used instead (see appendix). This estimate is unbiased as the number of clusters tends to infinity, but may be biased when the number of clusters is small. When there are multiple levels of clustering, a consistent variance estimate can be obtained by adjusting for clustering at the higher level—this implicitly accounts for lowerlevel clustering—but, since the number of higherlevel clusters is often small, bias maybe a concern.
The parameters, apart from the intercept, represent log odds ratios in both models. But they are interpreted differently in the two models. For example, for a single, binary predictor, x _{1}, β _{1} is the difference in log odds comparing individuals with x _{1}=1 and x _{1}=0 across the whole population; while α _{1} is the difference comparing x=1 and x=0 within a household. The odds ratio, unlike the risk difference and risk ratio, is not collapsible across strata, therefore α _{1} and β _{1} will be different unless α _{1}=β _{1}=0 or there is no variation between households and villages in disease risk.
In general, the relationship between the two sets of parameters can be derived by imagining a dataset that consists of the entire population, and that is generated by the random effects model. The parameters of the marginal model are the ‘estimates’ that are obtained when the marginal model is fitted to this dataset. Mathematically, this is equivalent to solving equation A2 in the appendix, after replacing Y _{ ij } with E _{ α }[Y _{ ij }x _{ ij }]=π _{ ij } [8]. Using this approach, Zeger et al. [8] derive the following relationship
where α is the vector of parameters from the random effects model, β is the vector of parameters from the marginal model and \(c=16\sqrt {3}/(15\pi)\). From equation 3, it can be seen the odds ratio is closer to the null in the marginal model than the random effects model, and the magnitude of the difference between the odds ratios depends on the amount of variation between clusters, both at the level of the household and the village.
Intracluster correlation
The variance of a regression parameter estimate depends on the amount of clustering in both the outcome and the predictor. The intracluster correlation, defined as the correlation between two observations from the same cluster, can be used to quantify the degree of clustering in both variables. Mathematically, it is defined as
where μ is the overall mean and the expectation is taken over all clusters and pairs of observations within clusters [1]. Assuming that observations are independent conditional on the cluster
where μ _{ j } is the mean for cluster j. Therefore ρ represents the ratio of the variance in cluster means to the overall variance of the observations.
By definition, ρ=1 for clusterlevel variables because all the variation is then between clusters, but ρ is less than 1 when variables pertain to lowerlevel units. For example, in a study where data are collected from different villages, village size would be a clusterlevel variable with ρ=1, but for household and individuallevel variables ρ<1. In fact, the intra cluster correlation is usually considerably less than 1 for observations made on lower level units. In a survey of binary and continuous outcomes recorded in clusterbased studies conducted in primary care the median intracluster correlation was 0.01 and 90 % were less than 0.055 [9].
The intracluster correlation of the outcome can be calculated directly from the random effects model (Eq. 1) for given values of the parameters and covariate. The intracluster correlation can also be calculated for each of the predictors, but in this case since these are not defined by a stochastic model it is calculated based on an empirical version of Eq. 4. Note that Eq. 5 implies that ρ≥0, but for the predictors the intracluster correlation is calculated from the sample rather than the model, consequently the independence assumption necessary for Eq. 5 is not met and the intracluster correlation is not necessarily positive. In fact, it reaches a lower bound of −1/(n−1) when the prevalence of the predictor is the same in each of n clusters [3]. We will use the notation ρ _{ y } to denote intracluster correlation defined by the stochastic model for the outcome and \(\hat {\rho }_{x}\) to denote the empirical intracluster correlation of a predictor.
Simulation
We conducted a simulation study to explore the coverage of confidence intervals for the parameters of the marginal model. The parameter values used in the simulation are given in Table 1, and we estimated coverage for every combination of these parameters.
For each parameter combination, we estimated coverage by simulating 10,000 samples from the population using the conditional model (Eq. 1). The marginal model (Eq. 2) was fitted to each sample, and we calculated confidence intervals unadjusted for clustering (CI ^{(un)}), and intervals adjusted for clustering within households (CI ^{(hh)}) and villages (CI ^{(vil)}). We estimated the coverage for each type of interval by calculating the proportion of the 10,000 intervals that contained the true marginal log odds ratio, which was calculated by solving Eq. A2 in the Appendix with Y _{ ij } replaced by E _{ α }[Y _{ ij }x _{ ij }]=π _{ ij } (see previous section on marginal and conditional models).
We used predictors of the outcome that varied in their degree of clustering within households and villages. At the extremes, we explored predictors where the proportion positive for the predictor was the same in each village such that \(\hat {\rho }_{x}^{(vil)} =1/(n1)\), and predictors where the village consists entirely of positives or negatives \(\hat {\rho }_{x}^{(vil)}=1\). Table 2 shows, for each predictor, the proportion of individuals positive in each village. We used both household, x _{1}−x _{4}, and individuallevel, x _{5}−x _{7} predictors. The former are the same for all members of the household (e.g., household income) and the latter vary between household members (e.g., age). We fitted models with a single predictor and also multivariable models that included all the predictors simultaneously.
CI ^{(hh)} and CI ^{(vil)} were calculated using CRVE (see Appendix) with two corrections to adjust for downward bias. First, the CRVE was inflated by a factor of n/(n−1), where n is the number of clusters. Second, the confidence interval was calculated using a tdistribution with n−1 degrees of freedom as the reference distribution rather than a standard normal distribution.
We did the simulations in R [10] using the rms package [11] to fit logistic regression models and calculate CRVE.
Results
Simulation results
CI ^{(un)} and CI ^{(hh)} had close to 95 % coverage when there was limited villagelevel clustering in the outcome or predictor, but for both coverage decreased as clustering in the outcome and predictor increased (Fig. 1). CI ^{(vil)} performed well when the number of villages was large, and also when the number of villages was small (K=5), provided that the predictor was not too strongly clustered at the villagelevel. For example, coverage was more than 85 % for \(\hat {\rho }_{x}^{(vil)}<0.5\), in models with a single predictor, and in models that included all predictors simultaneously it was more than 85 % for \(\hat {\rho }_{x}^{(vil)}<0.2\). CI ^{(vil)} was only outperformed by CI ^{(hh)} when there was limited villagelevel clustering in the outcome \((\rho _{y}^{(vil)}<0.02)\) and the intracluster correlation of the predictor was close to 1 \((\hat {\rho }_{x}^{(vil)}\approx 1)\).
Our findings were similar irrespective of whether a household (Fig. 1) or individuallevel predictor (Additional file 1: Figure S1) was used. They were also similar when all predictors (individual and householdlevel) were included in the model simultaneously (Additional file 2: Figure S2 and Additional file 3: Figure S3), although the coverage of CI ^{(vil)} was less good.
Example
We illustrate our findings by analysing data from a randomised trial of a house screening intervention to reduce malaria in children 6 months to 10 years [12]. The intervention was evaluated in terms of its impact on the numbers of mosquitoes caught, anaemia and malaria parasitaemia. The study also collected data on risk factors for malaria, including bed net use. Here we will focus on the presence of malaria parasites in the child, and estimate its association with bed net use and house screening. We use data from the six largest villages (or residential blocks in urban areas) collected on 428 children living in 209 households. The protocol was approved by the Health Services and Public Health Research Board of the MRC UK and The Gambia Government and MRC Laboratories Joint Ethics Committee, and the Ethics Advisory Committee of Durham University. All participants provided consent.
At householdlevel, malaria was strongly clustered, as were the two predictors: the intracluster correlation was 0.47 for malaria, 0.79 for bed net use and 1 for screening (by design). At the villagelevel, malaria and bed net use were strongly clustered (intracluster correlations 0.28 and 0.33), but screening was not clustered because it was randomly allocated to households.
The odds ratio for screening was 1.13 and the 95 % confidence interval adjusted for household clustering was 0.55 to 2.31. Since there are many households and screening is uncorrelated with village, we expect the coverage of CI ^{(hh)} to be close to 95 %.
The odds ratio for bed net use was 0.90. The confidence interval adjusted for household clustering was 0.50 to 1.63 and adjusted for village clustering it was 0.30 to 2.76. Because malaria and bed net use are both highly clustered at the villagelevel, we expect that CI ^{(vil)} will have better coverage than CI ^{(hh)}.
To further explore the difference between coverage of the two confidence intervals, we fitted a random effects model to the malaria data with bed net use as the predictor. We then simulated samples from this model to estimate the coverage of CI ^{(hh)} and CI ^{(vil)} for the marginal odds ratio, using the approach described in the previous section. As predicted, we found that the coverage of CI ^{(hh)} (68 %) was considerably worse than CI ^{(vil)}, which had reasonable coverage (85 %), despite the small number of villages.
Discussion
In general, we recommend using CRVE to adjust for clustering at the higher level. From simulation studies, we found that generally the coverage was better when confidence intervals were adjusted for the higher level of clustering. Adjusting for the lower level of clustering only gave better coverage (i.e., a higher proportion of confidence intervals included the true odds ratio) when the number of higherlevel clusters was small and the intra cluster correlation of the predictor at this level was close to 1. Neither adjustment produced satisfactory coverage when, at the higher level, there were few clusters and the outcome and predictor were both highly correlated with cluster.
We used two simple adjustments to improve the coverage of confidence intervals: the variance estimate was multiplied by n/(n−1), and the tdistribution with n−1 degrees of freedom was used as the reference distribution rather than the standard normal distribution. Both adjustments are implemented in the svyset command in Stata. Other methods for adjusting confidence intervals might give better coverage, but are not currently implemented in routine software. Pan and Wall [13] suggest modifying the degrees of freedom used for the reference tdistribution, and a number of authors have proposed methods for correcting for the bias in the variance estimates [14–16]. Bootstrap methods, in which clusters are resampled with replacement, offer another approach, but do not perform better than CRVE [17].
The results we have presented here are from simulation, rather than algebraic demonstration. Nevertheless the simulations cover wide ranges of the key parameters— ρ _{ x } and ρ _{ y } at the higher level of clustering—and our conclusions were not sensitive to the values used for the other parameters, except the number of higher level clusters. For this parameter we present results for a small (K=5) and large (K=20) number. At K=20 the coverage was close to 95 % when confidence intervals were adjusted for higherlevel clustering, and we expect coverage to improve if K>20. We chose not to explore with further granularity the region of parameter space where adjustment at the lower level of clustering is favourable (high ρ _{ x }, low ρ _{ y } at the higher level of clustering and a small number of clusters at this level) because the region is small and the lower level has only a slight advantage here.
We have explored the performance of standard errors adjusted for clustering without adjusting the log odds ratio. In the framework of Generalised Estimating Equations (GEE) this is equivalent to assuming an ‘independence’ working correlation matrix. The log odds ratio can be estimated more efficiently (i.e., with smaller asymptotic variance) if the correlation structure is used to inform the estimate. For a single level of clustering, a constant correlation between observations from the same cluster is often assumed—the socalled ‘exchangeable’ correlation structure. When there are multiple levels of clustering one could assume a constant correlation at the higherlevel, but this is a crude approximation because the correlation between observations from the same higherlevel cluster will depend on whether they also come from the same lowerlevel cluster. Several authors have therefore modelled the correlation structure that occurs when there is multilevel clustering and have demonstrated that this gives more efficient estimates compared to either the independence or the exchangeable structure [18–20]. While these methods provide benefit in terms of efficiency, the complicated correlation structure is not easily implemented in standard software, and the additional parameters can lead to problems with convergence, particularly when the number of cluster is small [20]. Furthermore, the loss of efficiency that results from assuming an independence structure is generally small [4, 21], except when the intracluster correlation of the outcome is large (ρ _{ y }>0.3) and the predictor varies within clusters [22]. The relative simplicity of assuming an ‘independence’ correlation structure (i.e., the CRVE approach discussed in this manuscript) might therefore remain attractive to the applied researcher, even if the resulting estimate it is not the most efficient.
Conclusions
CRVE are commonly used to construct confidence intervals that take account of clustering. When clustering occurs at multiple levels, CRVE can be used at the higher level of clustering, except if there are few clusters at this level and the intracluster correlation of the predictor is high.
Appendix
In a logistic regression model, the relationship between a binary variable Y _{ ij } and predictors x _{1i j },…,x _{ pij } is
where π _{ ij }=P (Y _{ ij }=1x _{ ij }) for observation i from cluster j.
Assuming responses are independent, the maximum likelihood estimate, \(\hat {\beta }\), is the solution to the equations
where Y _{ j } is a column vector of responses in cluster j, and X _{ j } is matrix whose columns are the predictors of Y _{ j }. Since the Y _{ j } are independent, it can be shown by the central limit theorem and using a Taylor expansion that, asymptotically, as the number of clusters (n) tends to infinity, \(\hat {\beta }\) is normally distributed with mean β and variance
where \(\text {Var}(U(\beta))=\sum _{j} X_{j}' \text {Var}(Y_{j}) X_{j} \), \(\frac {\partial U'}{\partial \beta }=\sum _{j} X_{j}' V_{j} X_{j}\) and V _{ j } is a diagonal matrix with elements π _{ ij }(1−π _{ ij }).
The socalled sandwich estimator, which is also referred to as the cluster robust variance estimate (CRVE), is obtained by replacing π _{ ij } in V _{ j } with \(\hat {\pi }_{ij}\) and using \((y_{j}\hat {\pi }_{j})(y_{j} \hat {\pi }_{j})'\) to estimate the covariance matrix Var(Y _{ j }).
Abbreviations
 CRVE:

Cluster robust variance estimate
 OLS:

Ordinary least squares
 CI:

Confidence interval
 GEE:

Generalised estimating equation
 SD:

Standard deviation
References
 1
Eldridge SM, Ukoumunne OC, Carlin JB. The intracluster correlation coefficient in cluster randomised trials: a review of definitions. Int Stat Rev. 2009; 77(3):378–394.
 2
Moulton BR. Random group effects and the precision of regression estimates. J Econ. 1986; 32(3):385–397.
 3
Scott AJ, Holt D. The effect of twostage sampling on ordinary least squares methods. J Am Stat Assoc. 1982; 77(380):848–854.
 4
Liang KY, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986; 73(1):13–22.
 5
Angrist JD, Pischke JS. Mostly harmless econometrics. 6 Oxford Street, Woodstock, Oxfordshire OX20 1TW: Princeton University Press; 2009.
 6
Bell RM, McCaffrey DF. Bias reduction in standard errors for linear regression with multistage samples. Surv Methodol. 2002; 28(2):169–181.
 7
Hubbard AE, Ahern J, Fleischer NL, Van der Laan M, Lippman SA, Jewell N, et al.To GEE or not to GEE: comparing population average and mixed models for estimating the associations between neighborhood risk factors and health. Epidemiology. 2010; 21(4):467–74.
 8
Zeger SL, Liang KY, Albert PS. Models for longitudinal data: a generalized estimating equation approach. Biometrics. 1988; 44(4):1049–60.
 9
Adams G, Gulliford MC, Ukoumunne OC, Eldridge S, Chinn S, Campbell MJ. Patterns of intracluster correlation from primary care research to inform study design and analysis. J Clin Epidemiol. 2004; 57(8):785–94.
 10
Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing. Austria: Vienna; 2015. http://www.rproject.org/. Accessed 24 Feb 2016.
 11
Harrell FE. rms: Regression Modeling Strategies. R package version 4.40. 2015. http://cran.rproject.org/package=rms. Accessed 24 Feb 2016.
 12
Kirby MJ, Ameh D, Bottomley C, Green C, Jawara M, Milligan PJ, et al.Effect of two different house screening interventions on exposure to malaria vectors and on anaemia in children in The Gambia: a randomised controlled trial. Lancet. 2009; 374(9694):998–1009.
 13
Pan W, Wall MM. Smallsample adjustments in using the sandwich variance estimator in generalized estimating equations. Stat Med. 2002; 21(10):1429–1441.
 14
McCaffrey DF, Bell RM. Improved hypothesis testing for coefficients in generalized estimating equations with small samples of clusters. Stat Med. 2006; 25(23):4081–4098.
 15
Fay MP, Graubard BI. Smallsample adjustments for Waldtype tests using sandwich estimators. Biometrics. 2001; 57(4):1198–1206.
 16
Mancl LA, DeRouen TA. A covariance estimator for GEE with improved smallsample properties. Biometrics. 2001; 57(1):126–134.
 17
Cameron AC, Miller DL. A practitioner’s guide to clusterrobust inference. J Hum Resour. 2015; 50(2):317–372.
 18
Qaqish BF, Liang KY. Marginal models for correlated binary responses with multiple classes and multiple levels of nesting. Biometrics. 1992; 48(3):939–50.
 19
Chao EC. Structured correlation in models for clustered data. Stat Med. 2006; 25(14):2450–68.
 20
Stoner JA, Leroux BG, Puumala M. Optimal combination of estimating equations in the analysis of multilevel nested correlated data. Stat Med. 2010; 29(4):464–73.
 21
McDonald BW. Estimating logistic regression parameters for bivariate binary data. J R Stat Soc Ser B. 1993; 55(2):391–397.
 22
Fitzmaurice GM. A caveat concerning independence estimating equations with multivariate binary data. Biometrics. 1995; 51(1):309–317.
Acknowledgements
This work was supported by funding from the United Kingdom Medical Research Council (MRC) and Department for International Development (DFID) (MR/K012126/1). The STOPMAL trial was funded by the UK Medical Research Council and registered as an International Standard Randomised Controlled Trial, number ISRCTN51184253. We would like to thank Richard Hayes for reviewing the manuscript.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
CB conducted the simulation study. CB and NA wrote the first draft of the manuscript. All authors reviewed the final draft of the manuscript.
Additional files
Additional file 1
Figure S1. Coverage of 95 % confidence intervals for the log odds ratio of an individuallevel predictor. The intracluster correlation of the predictor ranges between 0 (top) and 1 (bottom), for K=5 (left) or K=20 (right) villages. The remaining parameter values are α _{0}=log(0.1/0.9), α _{1}=σ _{ h }=log(2), I=5, J=20. (TIFF 30617 kb)
Additional file 2
Figure S2. Coverage of 95 % confidence intervals for the log odds ratio of a householdlevel predictor from a logistic regression model that includes multiple predictors (x1−x7). The intracluster correlation of the predictor ranges between 0 (top) and 1 (bottom), for K=5 (left) or K=20 (right) villages. The remaining parameter values are α _{0}=log(0.1/0.9), α _{1}=⋯=α _{7}=σ _{ h }=log(2), I=5, J=20. (TIFF 30617 kb)
Additional file 3
Figure S3. Coverage of 95 % confidence intervals for the log odds ratio of an individuallevel predictor from a logistic regression model that includes multiple predictors (x1−x7). The intracluster correlation of the predictor ranges between 0 (top) and 1 (bottom), for K=5 (left) or K=20 (right) villages. The remaining parameter values are α _{0}=log(0.1/0.9), α _{1}=⋯=α _{7}=σ _{ h }=log(2), I=5, J=20. (TIFF 30617 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Bottomley, C., Kirby, M.J., Lindsay, S.W. et al. Can the buck always be passed to the highest level of clustering?. BMC Med Res Methodol 16, 29 (2016). https://0doiorg.brum.beds.ac.uk/10.1186/s1287401601271
Received:
Accepted:
Published:
DOI: https://0doiorg.brum.beds.ac.uk/10.1186/s1287401601271
Keywords
 Clustering
 Multilevel
 Hierarchical
 Sandwich estimator
 Robust variance estimate
 Logistic regression