A comparison of statistical methods for modeling count data with an application to hospital length of stay

Fernandez, Gustavo A.; Vatcheva, Kristina P.

doi:10.1186/s12874-022-01685-8

Research
Open access
Published: 04 August 2022

A comparison of statistical methods for modeling count data with an application to hospital length of stay

BMC Medical Research Methodology volume 22, Article number: 211 (2022) Cite this article

6407 Accesses
6 Citations
Metrics details

Abstract

Background

Hospital length of stay (LOS) is a key indicator of hospital care management efficiency, cost of care, and hospital planning. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, or as an important risk factor for adverse events. Therefore, understanding hospital LOS variability is always an important healthcare focus. Hospital LOS data can be treated as count data, with discrete and non-negative values, typically right skewed, and often exhibiting excessive zeros. In this study, we compared the performance of the Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regression models using simulated and empirical data.

Methods

Data were generated under different simulation scenarios with varying sample sizes, proportions of zeros, and levels of overdispersion. Analysis of hospital LOS was conducted using empirical data from the Medical Information Mart for Intensive Care database.

Results

Results showed that Poisson and ZIP models performed poorly in overdispersed data. ZIP outperformed the rest of the regression models when the overdispersion is due to zero-inflation only. NB and ZINB regression models faced substantial convergence issues when incorrectly used to model equidispersed data. NB model provided the best fit in overdispersed data and outperformed the ZINB model in many simulation scenarios with combinations of zero-inflation and overdispersion, regardless of the sample size. In the empirical data analysis, we demonstrated that fitting incorrect models to overdispersed data leaded to incorrect regression coefficients estimates and overstated significance of some of the predictors.

Conclusions

Based on this study, we recommend to the researchers that they consider the ZIP models for count data with zero-inflation only and NB models for overdispersed data or data with combinations of zero-inflation and overdispersion. If the researcher believes there are two different data generating mechanisms producing zeros, then the ZINB regression model may provide greater flexibility when modeling the zero-inflation and overdispersion.

Peer Review reports

Background

In healthcare, length of stay (LOS) is a key indicator used to assess the hospital care management efficiency, cost of care, quality control, appropriate use of hospital services and resources, and hospital planning [1,2,3,4,5,6]. The need for efficient hospital management has been exemplified with the recent onset of the 2019 coronavirus/COVID-19 pandemic. Health crises like these show the best interest of patients, hospitals, and public health is in the efficient management of hospital stays while ensuring adequate bed capacity and that clinician time can be provided for patients with other conditions [7]. Reducing LOS improves financial, operational, and clinical outcomes by decreasing the costs of care for a patient and minimizing the risk of hospital-acquired conditions [8, 9]. In some hospitals, administrators benefit from using predictive models to assist with planning and resource allocation for deliveries [9]. Clinics optimize clinical settings by implementing analytical applications leading to timely and accurate decision making while reducing the hospital LOS [8,9,10]. Hospital LOS is often used as a measure of a post-medical procedure outcome, as a guide to the benefit of a treatment of interest, and/or as an important risk factor for adverse events, hospital readmission, and mortality [11,12,13]. Therefore, understanding hospital LOS variability across various patients’ clinical and socio-demographic characteristics and hospitals’ characteristics, such as geographic region and hospital sizes, is always an important public health focus [9, 14,15,16,17,18,19,20,21,22].

Inpatient hospital LOS is the number of nights spent in hospital, calculated from the day of admission to the day of discharge [23]. This type of data can be treated as count data, and count data values are usually nonnegative with a typically right-skewed distribution, often exhibiting excessive zeros and overdispersion [17, 24, 25]. Different analytic strategies have been used for modeling hospital LOS. However, the best way to model LOS and other right skewed data has been debated in the literature. Literature review showed that non-transformed or logarithm-transformed count outcome variable are often modeled with linear regression [26,27,28]. Linear regression is usually employed for continuous, normally, or approximately normally distributed outcomes. LOS data rarely adheres to these assumptions. Studies conducted to compare analyses of logarithm-transformed count outcome variables have reported several issues that might arise with such transformations, including zero values not considered, predicted meaningless negative values for the outcome variable, uninterpretable and biased parameter estimates, and inconsistent inferences about important policy parameters [29, 30]. Gardner et al. [31] showed that when the mean of the count outcome variable is small, linear regression produces biased standard errors and hence biased significance tests. Using simulation study, O’Hara [32] found that the log-transformations of count data often used to satisfy parametric test assumptions perform poorly, except when the dispersion was small, and the mean counts were large. When the mean count is very small and zero is the most common value in the data set, the normalization with log transformation will not work and the mode will always be at the lowest value [33]. Bryk et al. [34] stated, that there are important cases for which the assumption of linearity and normality are not realistic, and no transformation can make them so. An alternative approach that has been used as a solution to handle the non-normality of LOS outcome variable by researchers is to dichotomize LOS and use logistic regression to predict the LOS [35]. Dichotomizing count outcome variable lead to loss of information. Based on simulated and empirical data analyses, Sroka [36] concluded that more precise odds ratios estimates can be obtained using count regression models with log-odds link function. In summary, using linear regression models with or without logarithmic transformation of a count outcome variable, or logistic regression models on a dichotomized count outcome variable are subject to criticism for their inadequacy in modeling this type of data. This can lead to biased parameter estimates; prediction of meaningless negative values; and the loss of precision of inferences and important information about the underlying counts.

Common statistical methods for analysis of count data are Poisson, negative binomial (NB), zero-inflated Poisson (ZIP), and zero-inflated negative binomial (ZINB) regressions [24, 37,38,39,40]. The results from the existing research evaluating the performance of regression models for count data are conflicting regarding which model is preferred. Lambert (1992) compared ZIP to NB regression models in an experimental study concerning soldering defects on printing wiring boards where 81% of the board areas had 0 defects. He found that ZIP was better than the NB model in terms of prediction accuracy [37]. Greene (1994) compared Poisson, NB, ZIP, and ZINB models on a consumer loan behavior empirical data characterized with overdispersion and zero-inflation. In the analysis the author found that the NB model was superior to the ZIP model and the ZIP model was superior to the Poisson model in terms of model fit [41]. Slymen et al. (2006) compared Poisson, overdispersed Poisson, NB, ZIP and ZINB regression models in assessing predictors of vigorous physical activity among Latina women using data with 82% zeros in the outcome variable. They reported a little difference in ZIP and ZINB models' fits, however, overall, the ZIP model fitted best [42]. In overdispersed and zero-inflated data of the number of incidents involving human papillomavirus infection, Lee et al. (2012) found that ZIP, followed by NB, and ZINB had the smallest Akaike’s information criteria (AIC); and ZIP model showed the same results as the NB model regarding the covariates at a 0. 05 significance level. In addition, ZINB model did not always converged [43]. Tuzen et al. (2018) examined the performance in terms of fit of Poisson, NB, ZIP, ZINB, Poisson Hurdle and NB Hurdle models under various outliers and zero-inflation scenarios of simulated data and found that ZINB and NB Hurdle were superior to Poisson, NB, and ZIP models. They also reported that in some scenarios, the NB model outperformed all models in the presence of outliers and/or excess zeros [44]. Tlhaloganyang et al. [45] compared NB with ZIP and ZINB models using different real datasets characterized by overdispersion and zero-inflation. The authors found that NB provided a superior fit in all datasets [45].

Based on the reviewed literature, the question remains open to whether the different results in terms of model fit may arise from the different proportion of zeros, overdispersion, and sample size of the datasets used in these studies. In this study we had two objectives. The first objective was to compare the performance of Poisson, NB, ZIP, and ZINB regression models in simulation study. The second objective was to compare the performance of Poisson, NB, ZIP, and ZINB regression models using real life hospital data in assessing the effect of age, sex, health insurance status, and type of hospital admission on the hospital LOS. This research added to previous studies by including additional experimental scenarios, such as varying sample sizes, larger dispersion levels, various proportions of zero in the outcome variable, and data generated using Poisson and ZIP distributions, along with NB and ZINB distributions.

Methods

Overview of count data regression models

Poisson model

The most widely used and the most basic model that explicitly considers the nonnegative integer-valued aspect of the count outcome variable is the Poisson regression model [46]. Let ${Y}_{i}, i=1,\dots ,n$, be random variables for the number of occurrences of the event of interest and its realizations ${y}_{i}=0, 1, 2\dots$. Let ${{\varvec{X}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}=\left({X}_{1i}, \dots , {X}_{ki}\right)$ be a k-dimensional random vector of predictors and its realization ${{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}=\left({x}_{1i}, \dots , {x}_{ki}\right), i=1,\dots ,n$. Poisson regression assumes that the dependent variable Y_i, given ${{\varvec{X}}}_{{\varvec{i}}}={{\varvec{x}}}_{{\varvec{i}}}$ i = 1, …, n, is independently Poisson-distributed with:

$$P\left({{Y}_{i}=y}_{i}|{{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}\right)=\frac{{e}^{-{\mu }_{i}}{\mu }_{i}^{{y}_{i}}}{{y}_{i}!}, {y}_{i}=0, 1, 2, \dots$$

(1)

and the mean parameter (i.e., the mean number of events per period) is given by:

$${\mu }_{i}={e}^{{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}\beta }$$

(2)

where $\beta$ is a column vector of parameters.

In the Poisson regression model the conditional mean and the conditional variance of Y_i are equal (equidispersion):

$${E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)=V\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)=\mu }_{i}$$

(3)

Poisson regression model is also called log-linear model because the logarithm of the conditional mean is linear in the parameters:

$${\mathrm{ln}(E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right))=\mathrm{ln}(\mu }_{i})={{\varvec{x}}}_{i}^{^{\prime}}\beta$$

(4)

The marginal effect of a predictor variable ${X}_{j}$ is given by:

$$\frac{\partial E\left({Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={{\varvec{x}}}_{{\varvec{i}}}\right)}{\partial {x}_{ji}}={{\beta }_{j}e}^{{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}\beta }={\beta }_{j}E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)$$

(5)

and the interpretation of this effect is that a one-unit change in the j^th predictor leads to a ${\beta }_{j}$ change in the conditional mean $E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)$

Real-life count data often exhibit two (related) characteristics: overdispersion and zero-inflation. Overdispersion refers to an excess of variability in the data (i.e., the variance exceeds the mean), while zero-inflation refers to an excess of zeros [39, 47]. In the presence of overdispersion, the Poisson regression model is not adequate and can lead to biased parameter estimates and unreliable standard errors estimates [38, 39]. The most commonly used model that accounts for overdispersion is the negative binomial model.

Negative binomial model

The Poisson regression model can be generalized by introducing an unobserved heterogeneity term for observation i. The subjects are assumed to differ randomly in a manner that is not fully accounted for by the observed covariates. This is formulated as:

$${E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i},{\tau }_{i}\right)=\mu }_{i}{\tau }_{i}={e}^{{{\varvec{x}}}_{{\varvec{i}}}^{\boldsymbol{^{\prime}}}\beta +{\varepsilon }_{i}}$$

(6)

where the unobserved heterogeneity term ${\tau }_{i}={e}^{{\varepsilon }_{i}}$ is independent of the vector of predictor variables x_i. Then the conditional distribution of Y_i on ${{\varvec{X}}}_{{\varvec{i}}}={{\varvec{x}}}_{{\varvec{i}}}$ is Poisson with conditional mean and conditional variance ${\mu }_{i}{\tau }_{i}$:

$$f\left({{Y}_{i}=y}_{i}|{{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}},{\tau }_{i}\right)=\frac{{e}^{-{\mu }_{i}{\tau }_{i}}{({\mu }_{i}{\tau }_{i})}^{{y}_{i}}}{{y}_{i}!}, {y}_{i}=0, 1, 2, \dots$$

(7)

The negative binomial distribution is derived as a gamma mixture of Poisson random variables [39, 48,49,50]. By letting $g\left({\tau }_{i}\right)$ be the probability density function of ${\tau }_{i}$, the distribution $f\left({{Y}_{i}=y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={{\varvec{x}}}_{{\varvec{i}}} \right)$ is obtained by integrating $f\left({{Y}_{i}=y}_{i}|{{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}},{\tau }_{i}\right)$ with respect to ${\tau }_{i}.$ The analytical solution of the integral exists if ${\tau }_{i}$ is gamma distributed and this solution is the NB distribution. Specifically, it is necessary to assume that $E\left({\tau }_{i}\right)=1,$ and then ${\tau }_{i}$ follows gamma (θ, θ) distribution with $E\left({\tau }_{i}\right)=1$ and $V\left({\tau }_{i}\right)=\frac{1}{\theta }.$ It can be shown, that the NB distribution can be written as:

$$f\left({{Y}_{i}=y}_{i}|{{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}\right)=\frac{\Gamma \left({y}_{i}+{\alpha }^{-1}\right)}{{y}_{i}!\Gamma \left({\alpha }^{-1}\right)}{\left(\frac{{\alpha }^{-1}}{{\alpha }^{-1}+{\mu }_{i}}\right)}^{{\alpha }^{-1}}{\left(\frac{{\upmu }_{\mathrm{i}}}{{\alpha }^{-1}+{\mu }_{i}}\right)}^{{\mathrm{y}}_{\mathrm{i}}}, {y}_{i}=0, 1, 2, \dots$$

(8)

where ${\alpha }^{-1}=\theta$ and $\theta >0$ is the gamma scale parameter.

The NB conditional mean and conditional variance of the outcome variable ${y}_{i}$ are given by:

$${E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)=\mu }_{i}$$

(9)

$$V\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)={\mu }_{i}(1+\alpha {\mu }_{i})>E\left({{y}_{i}|x}_{i}\right)$$

(10)

The parameter $\alpha$ is defined as the dispersion parameter. As $\alpha$ approaches zero (i.e., the gamma scale parameter $\theta$ approaches infinity), $V\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)$ decreases to ${\mu }_{i}$=$E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right),$ and the NB distribution approaches the Poisson distribution. Thus, the Poisson regression model is nested within the NB regression model.

Zero-inflated count models

Zero-inflated count models provide a way to both model the excess zeros and the overdispersion (He et al. 2014) [51]. In particular, there are two possible data generation processes for the number of occurrences of the event of interest ${y}_{i}$ for each observation i = 1,…,n and the result of a Bernoulli trial is used to determine which of the two to use. For observation i, Process 1 is chosen with probability ${\varphi }_{i}$ and Process 2 with probability $1-{\varphi }_{i}$. Process 1 generates only zero counts (“structural” zeros). Process 2 generates counts from either a Poisson model [37] or a NB model [41]. $P\left({Y}_{i}={y}_{i}|{{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}\right)$ can be described as follows:

$$P\left({Y}_{i}={y}_{i}|{{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}\right)=\left\{\begin{array}{ll}{\varphi }_{i}+(1-{\varphi }_{i})g(0)& {I}_{({y}_{i}=0)}\\ (1-{\varphi }_{i})\mathrm{g}({y}_{i})& {I}_{({y}_{i}>0)}\end{array}\right.$$

(11)

where $\mathrm{g}({y}_{i})$ follows either Poisson or NB distributions, defined in (1) and (8), respectively, and therefore the zero-inflated count models are called either zero-inflated Poisson (ZIP) or zero-inflated negative binomial (ZINB) regression models, respectively.

Further, if ${\varphi }_{i}$ depends on the characteristics of observation i, then ${\varphi }_{i}={F}_{i}=F({z}_{i}^{^{\prime}}\gamma )$, where ${z}_{i}$ is a (q + 1)-dimensional vector of zero-inflated covariates and $\gamma$ is a (q + 1)-dimensional vector of zero-inflated regression coefficients to be estimated. The function F is called zero-inflated link function.

In the case of the ZIP regression model, the conditional expectation and the conditional variance of the outcome variable ${Y}_{i}$ are given by:

$${E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}, {z}_{i}\right)=\mu }_{i}(1-{F}_{i})$$

(12)

$$V\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}, {z}_{i}\right)=E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}, {z}_{i}\right)(1+{F}_{i}{\mu }_{i})>E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}, {z}_{i}\right)$$

(13)

Since $V\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}, {z}_{i}\right)>E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}, {z}_{i}\right)$, ZIP model exhibits overdispersion as well.

In the case of ZINB regression model, the conditional expectation and the conditional variance of the outcome variable ${Y}_{i}$ are given by:

$${E\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}, {z}_{i}\right)=\mu }_{i}(1-{F}_{i})$$

(14)

$$V\left({{Y}_{i}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{i}, {z}_{i}\right)=E\left({{{\varvec{Y}}}_{{\varvec{i}}}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}, {z}_{i}\right)[1+{(\alpha +F}_{i}{)\mu }_{i}]$$

(15)

Since $V\left({{{\varvec{Y}}}_{{\varvec{i}}}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}, {z}_{i}\right)>E\left({{{\varvec{Y}}}_{{\varvec{i}}}|{{\varvec{X}}}_{{\varvec{i}}}={\varvec{x}}}_{{\varvec{i}}}, {z}_{i}\right)$, the ZINB model like ZIP model exhibits overdispersion as well. Just as the NB distribution converges to the Poisson distribution as $\alpha$ approaches zero, the ZINB distribution converges to the ZIP distribution as $\alpha$ approaches zero.

Generalized linear models

Poisson, NB, ZIP, and ZINB are all part of the Generalized Linear Models (GLMs). The term GLM refers to a large class of models first introduced by Nelder and Wedderburn [52] and further developed and explained by McCullagh and Nelder [53]. GLMs extend standard linear regression models to encompass non-normal response distributions and possibly nonlinear functions of the mean [40]. The ordinary linear regression model uses linearity to describe the relationship between the mean of the response variable and a set of explanatory variables, with inference assuming that the response distribution is normal [40]. GLMs have three components: 1) A random component, that specifies the response variable Y_i, for the i^th observation and its probability distribution. 2) A inear component, ${\eta }_{i}={{\varvec{X}}}_{i}^{^{\prime}}\beta$, where $\beta$ is a column vector of parameters and ${{\varvec{X}}}_{i}$ is a column vector of predictors for the ith observation. 3) A monotonic differentiable link function g(.) describing how the expected value of variable Y_i is related to the linear predictor ${\eta }_{i}$, ${g\left[E\left(Yi\right)\right]=g({\mu }_{i}) = {\varvec{X}}}_{i}^{^{\prime}}\beta$, [40]. The response variable Y_i are independent for i = 1, 2, and have a probability distribution for an exponential family. This implies that the variance of the response variable Y_i depends on the mean ${\mu }_{i}$ through a variance function V: $var\left({Y}_{i}\right)=\frac{\phi V\left({\mu }_{i}\right)}{{\omega }_{i}},$ where $\phi$ is a constant, known as dispersion parameter, and ${\omega }_{i}$ is a known weight for each observation. The link function g for Poisson, NB, ZIP and ZINB regression models is log (${\eta }_{i}=\mathrm{log}({\mu }_{i})$). The binary link function h for the model of the probability of a zero count in the case of ZIP and ZINB regression models, is one of the logit, probit, or complementary log–log.

Simulation study

Dataset generation

Several datasets with one dependent variable y and two predictor variables x₁ and x₂ were generated from the following four distributions: Poisson, NB, ZIP, and ZINB. Variable x₁ was continuous and generated from a normal distribution with mean µ = 57.3 and a variance σ² = 306.25 representing the distribution of variable age observed in the Medical Information Mart for Intensive Care (MIMIC-III) dataset for patients with an asthma diagnosis [54,55,56]. The binary variable x₂ was generated from Bernoulli distribution with probability of success p = 0.43, representing the distribution of variable sex in the MIMIC-III dataset for patients with an asthma diagnosis. The values of the population regression coefficient β₀, β₁, and β₂ were pre-specified and obtained by fitting a NB regression model for the outcome variable hospital LOS in the same MIMIC-III dataset. For each of the simulated data under Poisson, NB, ZIP, and ZINB distributions, four different sample size scenarios were considered (50, 200, 600, and 1000). In the cases of count data generated with NB distribution or ZINB distribution different levels of dispersion (0.01, 1, 5, and 10) were considered under each of the sample size simulation scenarios. In the cases of count data generated with ZIP distribution or ZINB distribution, different proportions of structural zero (0.1, 0.3, 0.5, and 0.7) were considered under each of the sample size simulation scenario and under each of the dispersion levels simulation scenarios for data generated under ZINB distribution. To minimize the impact of simulation error, each scenario was repeated 1000 times. A summary of the simulation scenarios considered in the study is shown in Table 1.

Table 1 Simulation scenarios considered in the simulation study

A comparison of statistical methods for modeling count data with an application to hospital length of stay

Abstract

Background

Methods

Results

Conclusions

Background

Methods

Overview of count data regression models

Poisson model

Negative binomial model

Zero-inflated count models

Generalized linear models

Simulation study

Dataset generation

Models evaluation

Empirical study

Data description

Statistical analysis

Results

Simulation study

Data generated with poisson regression model

Data generated with NB regression model

Data generated with ZIP regression model

Data generated with ZINB regression model

Empirical study

Description of the study population

Comparison of fitted poisson, NB, ZIP, and ZINB regression models

Discussion

Concussions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Medical Research Methodology

Contact us