Skip to main content

Use of partitioned GMM marginal regression model with time-dependent covariates: analysis of Chinese Longitudinal Healthy Longevity Study

Abstract

Background

Elderly population’s health is a major concern for most industrial nations. National health surveys provide a measure of the state of elderly health. One such survey is the Chinese Longitudinal Healthy Longevity Survey. It collects data on risk factors and outcomes on the elderly. We examine these longitudinal survey data to determine the changes in health and to identify risk factors as they impact health outcomes including the elderly’s ability to do a physical check.

Methods

We use a Partitioned GMM logistic regression model to identify risk factors. The model also accounts for the correlation between lagged time-dependent covariates and the outcomes. It addresses present and past measures of time-dependent covariates on simultaneous outcomes. The relation produces additional regression coefficients as byproduct of the Partitioned model, identifying the immediate, delayed effects (lag − 1), further delayed (lag-2), etc. Therefore, the model presents the opportunity for decision makers to monitor the covariate over time. This technique is particularly useful in healthcare and health related research. We use the Chinese Longitudinal Health Longevity Survey data to identify those risk factors and to display the utility of the model.

Results

We found that one’s ability to make own decisions, frequently consuming vegetables, exercise frequently, one’s ability to transfer without assistance, having visual difficulties and being able to pick book from floor while standing had varying effects of significance on one’s health and ability to complete physical checks as they get older.

Conclusions

The partitioning of the covariates as immediate effect, delayed effect or further delayed effect are important measures in a declining population.

Peer Review reports

Background

Longitudinal studies in medical-related research are useful in identifying changes in outcomes as impacted by certain risk factors. While the repeated measurements on subjects generate correlated observations, they are of different types of correlation. There is correlation among the responses. There is correlation between the time-dependent covariates and the response. These correlations have different impacts on the outcomes. Thus, any models fitted to these data need to address these correlations accordingly.

Modelling time-dependent covariates when analyzing binary outcomes in longitudinal studies has drawn attention. There are methods due to Generalized Estimating Equations (GEE) and others based on Generalized Method of Moments (GMM) [1,2,3,4,5]. However, these methods do not separate out the impact of the time-dependent covariates on the outcomes. In fact, they provide estimates that represent an average of the impacts. Obermeier et.al [6]. suggested that when modeling longitudinal data, one could not assume that the association between a time-dependent covariate and the outcome was only direct and simultaneous. This is because the outcome might depend on past measurements of the covariate. Thus, an alternative approach is to separate the different impacts of the covariate. Heagerty [7] indicated that one way to properly model longitudinal outcomes with time-dependent covariates is to include appropriate lagged values of such covariates. This approach requires additional regression coefficients for each segment of time-dependent covariate. These additional coefficients allow parsing of the effect of the covariate on the response, rather than assuming that the association maintains the same strength and direction over time. It provides insight into the effects of time-dependent covariates on present and future values of the outcomes.

Motivating example

Elderly population’s health is a major concern for most industrial nations. National health surveys provide a measure of the state of elderly health. One such survey uses the Chinese Longitudinal Healthy Longevity Study (CLHLS) [8]. It collects data on risk factors and outcomes on the elderly population. The CLHLS was designed to identify key factors contributing to healthy longevity among elderly adults in China. The survey was conducted over time but we concentrated on four waves 2005, 2008, 2011 and 2014. This survey is of particular interest in China, as their annual growth rate of the elderly population is approximately 4.4% and approximately 20% of the world’s oldest population live in China [8]. Gu, Zhang and Zeng [9] investigated the impact of adequate access to healthcare. Li, Zhang and Liang [10] used waves 1 & 2 to determine how living arrangements in 1998 impacted self-rated health in 2000. Zheng et.al [11]. studied the associations of environmental variables. Wu and Schimmele [12] tested how levels of psychological disposition in 1998 impacted self-rated health in 2000. Wang, Zheng, Kurosawa and Inaba [13] studied gender and age differences in health among elderly Chinese using data collected in 2002. However, in all of these studies only one or two waves of data were used and researchers were only able to determine cross-sectional or lag-1 effects of time-dependent covariates on the outcomes.

In this paper, we made use of four waves to demonstrate the fit of Partitioned GMM for binary simultaneous outcomes, completion of a physical check and their health status. These responses were objectively measured by an interviewer. There are subjective measures but we concentrated on the objective measures. We focused our attention on the longitudinal aspect of the data and used all four waves. This increased number of waves used allows us to optimize the longitudinal nature of the data.

Data

The data consisted of elderly people 64 years and older living in 22 of 31 provinces in China. There were 8084 observations measured on 2021 individuals over the four waves. We fit models to interviewer-rated health and completion of a physical check that included the time-independent covariate gender. These models also included the time-dependent covariates: able to make own decision, consumed vegetables frequently, exercised, transfer without assistance, visual difficulty and ability to pick up book from floor while standing. Descriptive statistics for the outcomes and time-dependent covariates are given in Tables 1 and 2, respectively. Our initial observation suggested a steady decline in the percentage of interviewees considered healthy over time, Table 1.

Table 1 Descriptive Statistics for four outcomes (%)
Table 2 Descriptive Statistics for time-dependent covariates (%)

Methods

We fit a partitioned GMM logistic regression model [14] to the Chinese Longitudinal Healthy Longevity Study data to determine the effects of time-dependent covariates on the binary outcomes. The model measures the impact of time independent and time-dependent covariates X on the outcome Y measured at four different time points. Thus, there are some relations between X and Y other than cross sectional that must be addressed, Fig. 1. Thus, the partitioned GMM logistic regression model [14] provides coefficient estimates for the effect of X on Y when both are measured at the same time, for when X is measured one-time period ahead of Y, for when X is measured two-time periods ahead to Y and for when X is measured three-time periods ahead to Y.

Fig. 1
figure 1

Relationship between time-dependent covariate X and Y across four time-periods

Partitioned GMM logistic regression models with time dependent covariates

Let yit denote the binary observation for individual i (i = 1, …, N) at time t (t = 1, …, T). Let xit = (xi1t, …, xiJt) be a vector of J time-dependent covariates, where xijt is the jth covariate observed at time t for individual i. Assume that observations yis and ykt are independent when i ≠ k but not necessarily when i = k and s ≠ t. The Partitioned GMM logistic regression model accounts for the relationships between the outcomes observed at time t, yi = (yi1, .., yiT) and the jth covariate observed at time s, xijs for s ≤ t. For each subject i and each time-dependent covariate xijt measured at times t = 1, 2, …, T; the data matrix is reconfigured as a lower triangular matrix,

$$ {\boldsymbol{X}}_{ij}=\left[\begin{array}{c}{x}_{ij1}\\ {}{x}_{ij2}\\ {}\begin{array}{c}\vdots \\ {}{x}_{ij T}\end{array}\end{array}\ \begin{array}{ccc}0& \dots & 0\\ {}{x}_{ij1}& \dots & 0\\ {}\begin{array}{c}\vdots \\ {}{x}_{ij\left(T-1\right)}\end{array}& \begin{array}{c}\vdots \\ {}\dots \end{array}& \begin{array}{c}\vdots \\ {}{x}_{ij1}\end{array}\end{array}\right]=\left[{\boldsymbol{x}}_{ij}^{\left[0\right]}\kern0.5em {\boldsymbol{x}}_{ij}^{\left[1\right]}\kern0.5em \dots \kern0.5em {\boldsymbol{x}}_{ij}^{\left[T-1\right]}\right] $$

where the superscript denotes the difference, t − s in time-periods between the response time t and the covariate time s. In this matrix, \( {\boldsymbol{x}}_{ij}^{\left[0\right]} \) contains values of the time-dependent covariate observed at the same time as the outcome, \( {\boldsymbol{x}}_{ij}^{\left[1\right]} \) includes values of the time-dependent covariate observed one-time period prior to outcomes, and so on such that \( {\boldsymbol{x}}_{ij}^{\left[T-1\right]} \) consists of the values of the covariate measured T − 1 time periods prior to outcome. Thus, the model for the outcome at time t with one time-independent covariate and one time-dependent covariate is

$$ logit\left({\mu}_{it}\right)={\beta}_0+{\beta}_F{x}_F+{\beta}_j^{tt}{x}_{ij t}+{\beta}_j^{\left[1\right]}{x}_{ij\left(t-1\right)}+{\beta}_j^{\left[2\right]}{x}_{ij\left(t-2\right)}+\dots +{\beta}_j^{\left[t-1\right]}{x}_{ij1} $$
(1)

while the model for all time periods in matrix form is

$$ logit\left[\begin{array}{c}{\mu}_{i1}\\ {}{\mu}_{i2}\\ {}\begin{array}{c}\vdots \\ {}{\mu}_{iT}\end{array}\end{array}\right]={\beta}_0\left[\begin{array}{c}1\\ {}1\\ {}\begin{array}{c}\vdots \\ {}1\end{array}\end{array}\right]+{\beta}_F{x}_F\left[\begin{array}{c}1\\ {}1\\ {}\begin{array}{c}\vdots \\ {}1\end{array}\end{array}\right]+{\beta}_j^{tt}{\boldsymbol{x}}_{ij}^{\left[0\right]}+{\beta}_j^{\left[1\right]}{\boldsymbol{x}}_{ij}^{\left[1\right]}+{\beta}_j^{\left[2\right]}{\boldsymbol{x}}_{ij}^{\left[2\right]}+\dots +{\beta}_j^{\left[T-1\right]}{\boldsymbol{x}}_{ij}^{\left[T-1\right]} $$

The coefficient \( {\beta}_j^{tt} \) denotes the effect of the covariate xijt on the response Yt when both are observed in the same time-period, while the vector of coefficient βF denotes the effect of the time-independent covariate xF on the response Yt. When s < t, we denote the lagged effect of the covariate xjs on the response Yt by the coefficients \( {\beta}_j^{\left[1\right]},{\beta}_j^{\left[2\right]},\dots, {\beta}_j^{\left[T-1\right]} \). In general, each of the J time-dependent covariates yield a maximum of T partitions of βj. Thus, for a model with J covariates, the data matrix X has a maximum dimension of NT by (J × T) + 1, and β is a vector of maximum length (J × T) + 1.

This method of estimating regression coefficients relies on valid moment conditions resulting from the covariate values at different times on the outcome at other times. The moment conditions are valid at cross-sectional measurements where covariates are measured at the same time as the outcome [2]. However, valid moment conditions between lagged covariates and the outcomes need to be tested. We do so through a test of bivariate correlation developed by Lalonde, Wilson and Yin [3]. Once the valid moments are identified, the regression parameters are estimated using a GMM approach [14]. We do not rehash the derivations here. We encourage the readers, who want to see that development to go to Lalonde, Wilson, and Yin [3], and Irimata, Broatch, and Wilson [14]. We fit these models through SAS 9.4 software using the %partitionedGMM macro (https://github.com/kirimata/Partitioned-GMM) [15]. It includes the test for valid moment conditions [3].

In our analysis of data in CLHLS, we fit two partitioned GMM logistic regression models to model interviewer-rated health and interviewees’ ability to complete a physical check separately.

Results

Health

Immediate impacts were identified for vegetables (OR = 1.70 with 95% CI: 1.30, 2.23), exercise (OR = 2.03 with 95% CI: 1.52, 2.71), transfer without assistance (OR = 3.65 with 95% CI: 2.39, 5.59), having visual difficulties (OR = 0.64 with 95% CI: 0.49, 0.84) and pick book from floor while standing (OR = 4.11 with 95% CI: 3.11, 5.43), Table 3. For a one time-period lag (i.e. delayed effect), exercise (OR = 1.39 with 95% CI: 1.03, 1.89) and transfer without assistance (OR = 1.76 with 95% CI: 1.05, 2.95) significantly impacts the outcome. Across a two-time period lag (further delayed effect), transfer without assistance (OR = 0.44 with 95% CI: 0.24, 0.81) had a significant impact on this outcome, Table 3. There were no significant effects across a three-time period lag (furthermost delayed effect).

Table 3 Results of partitioned GMM model for interviewer-rated health and ability to complete physical check

Complete physical check

Immediate impacts were obtained for making own decisions (OR = 1.61 with 95% CI: 1.05, 2.48), transfer without assistance (OR = 13.83 with 95% CI: 8.23, 23.27), visual difficulties (OR = 0.39 with 95% CI: 0.25, 0.61) and pick up book from floor while standing (OR = 5.88 with 95% CI: 4.04, 8.54), Table 3, Fig. 3.

Further impacts were seen at lag-2 for transfer without assistance (OR = 4.30 with 95% CI: 1.78, 10.43). An additional delayed impact at lag-3 was seen for eating vegetables frequently (OR = 2.12 95% CI: 1.04, 4.33), Table 3, Fig. 3.

Discussion

The uniqueness of the partitioned GMM logistic regression models allows the immediate effect as well as future effects of time-dependent covariates on outcomes to be measured. Unlike the previous studies, researchers analyzed the CLHLS data but were only able to estimate cross-sectional or lag-1 effects of time-dependent covariates. However, we were able to determine both cross-sectional and lag-1 associations as well as lag-2 and lag-3 relationships between the time-dependent covariates and our two binary outcomes, Table 4.

Table 4 Positive and negative effects over time of time-dependent covariates on interviewer-rated health and physical check completion

Figure 2 presents the relationships between the time-dependent covariates and interviewer-rated health, over time. We found that gender and the ability to make one’s own decision did not impact the probability of good health. Frequent consumption of vegetables increased good health immediately, but did not have any significant lagged effects. Exercising significantly increased the likelihood of being in good health immediately and in the next time period. The ability to transfer without assistance has a positive impact on good health immediately and in the next time period. Having visual challenges has an immediate negative impact on having good health. The ability to pick book from floor while standing has an immediate positive impact on good health.

Fig. 2
figure 2

Partitioned GMM model for interviewer-rated health (odds ratio estimates and 95% confidence intervals)

Gender did not significantly impact the likelihood of completing a physical check. The ability to make one’s own decisions has an immediate positive impact on completing a physical check. Consumption of vegetables frequently in the first wave significantly increased the likelihood of completing a physical check in the last wave. Exercising did not impact the completion of a physical check at any point in time. Ability to transfer without assistance significantly increases the likelihood of completing a physical check immediately and across a two time-period lag. Having visual challenges negatively impacted completing a physical check immediately. Being able to pick up a book from floor while standing increases the probability of completing a physical check. Figure 3 presents the changing relationships between the time-dependent covariates and the ability to complete a physical check.

Fig. 3
figure 3

Partitioned GMM model for ability to complete physical check (odds ratio estimates and 95% confidence intervals)

Conclusions

Though we fitted the Partitioned GMM model to two binary outcomes, this model readily accommodates continuous outcomes. The partitioning of the data matrix with the use of additional coefficients provides an opportunity to measure the covariate on the responses at different periods.

Availability of data and materials

The dataset analyzed during the current study is available at the Inter-University Consortium for Political and Social Research repository, https://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/36692.

Abbreviations

CLHLS:

Chinese Longitudinal Healthy Longevity Study

GMM:

Generalized Method of Moments

WPA:

World Population Aging

OR:

Odds Ratio

CI:

Confidence Interval

References

  1. Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73(1):13–22.

    Article  Google Scholar 

  2. Lai TL, Small D. Marginal regression analysis of longitudinal data with time-dependent covariates: a generalised method of moments approach. J R Stat Soc Ser B. 2007;69(1):79–99.

    Article  Google Scholar 

  3. Lalonde TL, Wilson JR, Yin J. GMM logistic regression models for longitudinal data with time-dependent covariates. Stat Med. 2014;33(27):4756–69.

    Article  Google Scholar 

  4. Guerra M, Shults J, Amsterdam J, Ten-Hav T. The analysis of binary longitudinal data with time-dependent covariates. Stat Med. 2012;31(10):931–48.

    Article  Google Scholar 

  5. Zhou Y, Lefante J, Rice J, Chen S. Using modified approaches on marginal regression analysis of longitudinal data with time dependent covariates. Stat Med. 2014;33(19):3354–64.

    Article  Google Scholar 

  6. Obermeier V, Scheipl F, Heumann C, Wassermann J, Küchenhoff H. Flexible distributed lags for modelling earthquake data. J R Stat Soc Ser C Appl Stat. 2015;64(2):395–412.

    Article  Google Scholar 

  7. Heagerty PJ. Marginalized transition models and likelihood inference for longitudinal cateogorical data. Biometrics. 2002;58(2):342–51.

    Article  Google Scholar 

  8. Zeng Y, Vaupel J, Xiao Z, Liu Y, Zhang Z. Chinese Longitudinal Healthy Longevity Survey (CLHLS), 1998-2012. Ann Arbor, MI;.

  9. Gu D, Zhang Z, Zheng Y. Access to healthcare services makes a difference in healthy longevity among older Chinese adults. Soc Sci Med. 2009;68(2):210–9.

    Article  Google Scholar 

  10. Li L, Zhang J, Liang J. Health among oldest old in China: which living arrangements make a difference. Soc Sci Med. 2009;68(2):220–7.

    Article  Google Scholar 

  11. Zheng Y, Gu D, Purser J, Hoenig H, Christakis N. Associations of environmental factors with elderly health and mortality in China. Am J Public Health. 2010;100(2):298–305.

    Article  Google Scholar 

  12. Wu Z, Schimmele C. Psychological disposition and self-reported health among the oldest-old in China. Ageing Soc. 2006;26(1):135–51.

    Article  Google Scholar 

  13. Wang D, Zheng J, Kurosawa M, Inaba Y. Relationship between age and gender differentials in health among older people in China. Ageing Soc. 2009;29(7):1141–54.

    Article  Google Scholar 

  14. Irimata KM, Broatch J, Wilson JR. Partitioned GMM logistic regression models for longitudinal data. Stat Med. 2019;38(12):2171–83.

    Article  Google Scholar 

  15. Irimata KM, Wilson JR. Using SAS to estimate lagged coefficients with the %partitionedGMM macro. In: SAS Global Forum 2018 Conference Proceedings. Denver, CO; 2018.

Download references

Acknowledgements

Not Applicable.

Declarations

Not Applicable.

Funding

Not Applicable.

Author information

Authors and Affiliations

Authors

Contributions

EVA conducted literature review and final data analysis, created tables and figures, wrote methods, results, discussion and conclusion sections. DX identified outcomes of interest and conducted initial analysis and data cleaning. JRW revised and edited all drafts of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elsa Vazquez-Arreola.

Ethics declarations

Ethics approval and consent to participate

Not Applicable.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vazquez-Arreola, E., Xue, D. & Wilson, J.R. Use of partitioned GMM marginal regression model with time-dependent covariates: analysis of Chinese Longitudinal Healthy Longevity Study. BMC Med Res Methodol 20, 128 (2020). https://0-doi-org.brum.beds.ac.uk/10.1186/s12874-020-01003-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12874-020-01003-0

Keywords