Skip to main content
  • Research Article
  • Open access
  • Published:

Joint model robustness compared with the time-varying covariate Cox model to evaluate the association between a longitudinal marker and a time-to-event endpoint

Abstract

Background

The recent progress in medical research generates an increasing interest in the use of longitudinal biomarkers for characterizing the occurrence of an outcome. The present work is motivated by a study, where the objective was to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute Graft-versus-Host Disease (GvHD) after haematopoietic stem cell transplantation. Time-varying covariate Cox model was commonly used, despite its limiting assumptions that marker values are constant in time and measured without error. A joint model has been developed as a viable alternative; however, the approach is computationally intensive and requires additional strong assumptions, in which the impacts of their misspecification were not sufficiently studied.

Methods

We conduct an extensive simulation to clarify relevant assumptions for the understanding of joint models and assessment of its robustness under key model misspecifications. Further, we characterize the extent of bias introduced by the limiting assumptions of the time-varying covariate Cox model and compare its performance with a joint model in various contexts. We then present results of the two approaches to evaluate the potential of PTX3 as a prognostic marker of GvHD after haematopoietic stem cell transplantation.

Results

Overall, we illustrate that a joint model provides an unbiased estimate of the association between a longitudinal marker and the hazard of an event in the presence of measurement error, showing improvement over the time-varying Cox model. However, a joint model is severely biased when the baseline hazard or the shape of the longitudinal trajectories are misspecified. Both the Cox model and the joint model correctly specified indicated PTX3 as a potential prognostic marker of GvHD, with the joint model providing a higher hazard ratio estimate.

Conclusions

Joint models are beneficial to investigate the capability of the longitudinal marker to characterize time-to-event endpoint. However, the benefits are strictly linked to the correct specification of the longitudinal marker trajectory and the baseline hazard function, indicating a careful consideration of assumptions to avoid biased estimates.

Peer Review reports

Background

The recent progress in molecular biology and genetics generates an increasing interest in investigating genomic or molecular biomarkers, as markers of diagnosis, prognosis or response to treatment. The longitudinal measure of biomarkers is useful for characterizing the occurrence of an outcome of interest, as they can be predictive of treatment results or related to the event process and prognosis. For example, the present work is motivated by a study, where the objective was to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute Graft-versus-Host Disease (GvHD) after haematopoietic stem cell transplantation [1].

The time-varying covariate Cox model (TVCM) [2, 3] has been used to study the association between an observed longitudinal measure of biomarkers and the hazard of an event [1, 4]. This approach uses the last-observation-carried-forward (LOCF), since marker’s observations are only available at discrete times (i.e. time of measurement), leading to the pitfall of introducing bias given the continuous nature of the biomarker [5]. Further, the TVCM fails to account for the so called “measurement error” in the biomarker. As evidenced by various studies (e.g., [6, 7]), failure to adjust for such measurement error introduces further bias into model estimates.

Shared frailty joint models address these issues by modeling simultaneously the profile of the marker and the time-to-event data [8, 9]. Within such approaches, a linear mixed model for the underlying longitudinal trajectories of the marker is linked to the survival model using shared random effects [10]. This approach allows inference on the association between the hazards of an event and the longitudinal biomarkers, by avoiding the LOCF assumption and accounting for the random measurement error [11]. However, joint models are parametric and thus require additional strong assumptions over the semi-parametric Cox model with time-varying covariate [12]. Assumptions are needed on both the distribution of the marker and its trajectory, and on the shape of the hazard function of the event of interest.

The literature that evaluates the impacts of misspecification of joint models for their applications in biomedical research has been particularly rare, while methodological efforts rapidly increasing (e.g., [13]). This causes a lack of clarity on practical issues, which in turn discourages applied researchers to improve the understanding of such models [14, 15]. Few simulation studies have been performed in the joint modeling framework. [16] investigated the use of joint models to adjust for measurement error only at the baseline measurement value. Simulation by [11] evaluated the performance of the joint model and the TVCM focusing on treatment effect on the time-to-event outcome, while [17] focused on the association between marker and event under few specific scenarios. A broader simulation study that evaluates the impact of model misspecifications and that could be useful for applied statisticians in order to understand advantages and disadvantages of a joint model as compared with a Cox model in different contexts is lacking. Moreover, the distinctive role of the bias due to LOCF and measurement error in the TVCM has not received attention in the previous studies. In this paper, we conduct a comprehensive simulation study with the following goals: (a) disentangle the bias introduced by LOCF and measurement error when assessing the association between a marker and a time-to-event endpoint by the TVCM and to compare its performance with a joint model, (b) clarify relevant assumptions of the joint model and assess its robustness in the presence of key model misspecifications, in particular considering the misspecifications of the marker distribution, of the marker trajectory, and of the shape of the hazard function. Additionally, these theoretical considerations will be used to evaluate the potential of PTX3 as a prognostic marker of GvHD after haematopoietic stem cell transplantation.

In “Method” section below, we describe the TVCM and the joint model approaches. In “Simulation study” section we present the simulation studies: simulation protocol, key model misspecification scenarios and discussion of associated results. In “Motivating context” section, we present an application to illustrate the use of PTX3 as a marker of GvHD using both the TVCM and joint model. Concluding discussion is presented in “Discussion” section.

Method

Notation

Let \(T^{*}_{i}\) be the failure time of subject i (i=1,…,n) in a cohort of size n. Suppose that we want to estimate the association between a biomarker wi(t), that is varying in time, and the hazard of failure. In practice, the longitudinal biomarker is measured at discrete times tij,j=1,…,ni. Thus, biomarker information coming from the i-th subject is a vector of observed discrete values, possibly subjected to the measurement error εi(t), {yi(tij)=wi(tij)+εi(tij),j=1,…,ni}. Since survival times are commonly affected by right censoring, the observed survival time is \(T_{i}=\text {min}(T^{*}_{i}, C_{i})\), where Ci is the right censoring time and \(\delta _{i}=I(T^{*}_{i}\leq C_{i})\) is the event indicator, indicating whether the survival time or the censoring time is observed. \(T^{*}_{i}\) and Ci are assumed to be independent conditional on the biomarker trajectory wi(t), as commonly done in survival analysis (e.g., [18]).

The time-varying covariate Cox model

The TVCM is a generalization of the Cox model [2] accounting for covariates that can change value during the observation time. The proportional hazards model has the form

$$ h_{i}(t)=h_{0}(t)\exp\{\alpha y_{i}(t)\} $$
(1)

where h0(t) denotes an unspecified baseline hazard, α is a parameter measuring the association between the observed longitudinal measure yi(t) and the hazard at time t (hi(t)). A vector of fixed baseline covariates can also be included in the model (1). The hazard ratio HR= exp(α) is interpreted as the relative increase in the hazard at any time t for a unit increase in the observed value of the biomarker at the same time point. The HR is assumed to be constant in time, thus we assume that the relative increase in the hazard for each unit increase in the biomarker is the same for all the observation time. Inference is based on maximizing the partial likelihood [3]. Of note, when yi(t) is not observed at time t, the most updated value is used: yi(tij),tijt<tij+1, using the LOCF principle [8].

Joint models

A joint model of longitudinal and survival data comprises two linked submodels: the longitudinal and the survival submodels [10, 19]. The longitudinal submodel specifies the trajectory of a biomarker over time. This is typically achieved using a linear mixed effects model [20] of the form:

$$ y_{i}(t)=w_{i}(t) + \epsilon_{i}(t)=\boldsymbol{\beta}^{T}\boldsymbol{f}_{i}(t) + \boldsymbol{b}^{T}_{i}\boldsymbol{g}_{i}(t)+ \epsilon_{i}(t) $$
(2)

in which fi(t) and gi(t) are vectors of functions of time t for the fixed effect parameters β and the random effect parameters bi, respectively. The component εi(t) denotes mutually independent normally distributed error terms with variance \(\sigma ^{2}_{\epsilon }\). For the random effects, one assumes biMVN(0,Σ), where Σ is inter-subject variance-covariance matrix. Further, the random effects are assumed to be independent of the error terms. In model (2) the observed marker value yi(t) at time point t is decomposed into the underlying true marker value wi(t) and a random error term. The survival submodel attempts to associate the marker value with the hazard of an event at the same time point t using the proportional hazards model:

$$ h_{i}(t)=h_{0}(t)\exp\{\alpha w_{i}(t)\} $$
(3)

Similarly to (1), the parameter α measures the association between the longitudinal biomarker and the time-to-event and the hazard ratio HR= exp(α) is assumed constant in time. A vector of fixed baseline covariates can be included in this model as well. The basic difference with (1) is that model (3) does not use the observed value of the biomarker yi(t), but an estimate of the true value wi(t), which is continuously updated in time and obtained by maximizing the joint likelihood of the time-to-event and longitudinal marker outcomes. As a note, an appropriate estimate of the subject trajectory wi(t) requires correct specification of the design vectors fi(t) and gi(t). The optimization procedure involves a hybrid of expectation maximization (EM) and direct maximization as discussed in [10]. Unlike in the TVCM of (1), the baseline hazard must be specified parametrically or approximated by spline-based approaches. In fact, leaving the baseline hazard completely unspecified within the joint modeling framework severely underestimates the standard errors of the parameter estimates [21]. While the association parameter in both (3) and (1) is denoted by α, the corresponding estimates from the two models would be different.

Simulation study

In this section, we conduct a simulation study under various scenarios in order to address the two aims, (a) disentangling the bias introduced by LOCF and measurement error when assessing the association between a marker and a time-to-event by the TVCM and compare its performance with that of the joint model. The second aim (b) focuses on clarifying relevant assumptions of the joint model and assess its robustness in the presence of model misspecifications. In fact, in the joint modeling framework, the association between the longitudinal marker and the hazard of an event relies on several assumptions on the longitudinal and survival submodels, including the marker distribution, the marker trajectory and the shape of the hazard function. The impacts of misspecifying these assumptions are illustrated, respectively, in the sections b1, b2 and b3. Table 1 summarizes the main parameter values used for the simulation scenarios, which are described below. All simulations and analyses were performed using the R package JM version 1.4.7.

Table 1 Summary of the simulation protocol comprising main parameter values, marker and survival time distributions used for each of the simulation scenarios

Simulations protocol

We considered a sample size of n=300 subjects with regular measures of the biomarker for 14 weeks, including the baseline measurement (t=0,...14). The simulation setting was inspired by the motivating context of the data in “Motivating context” section. Data were generated by the following steps:

  1. 1

    The general formula to obtain the true marker value wi(t) was given as

    $$ \begin{aligned} w_{i}(t)&=\beta_{0}+\beta_{1}t+\beta_{2}t^{2}+b_{i0}+b_{i1}t+b_{i2}t^{2} \\ &\boldsymbol{b}_{i}=(b_{i0},b_{i1}, b_{i2})^{T}\sim N_{3}(\boldsymbol{0}, \Sigma), \\ \end{aligned} $$
    (4)

    where Σ denotes 3 by 3 inter-subject variance-covariance matrix. When a linear decreasing trajectory was considered, as for the majority of the scenarios reported in Table 1, the fixed effect parameters were chosen to be β0=3.2, β1=−0.07 and β2=0. A basic scenario of biomarker with constant value in time was also considered by setting β1=β2=0 (scenario 1, Table 1). To assess the misspecification of the marker distribution (b1), a random intercept model was considered with bi0 generated from four different probability distributions: a Bimodal mixture of two normal distributions (hereafter called Bimodal), Chisquare, Gamma and Normal (scenarios 3 to 6). The parameter values of these distributions were chosen such that their corresponding variances equaled the random intercept variance Σ11=1.44. Model (4) was used to investigate misspecification of marker trajectory (b2) by generating biomarker values with a quadratic profile in scenarios 7 and 8, as depicted in Fig. 2a.

    Fig. 1
    figure 1

    Mean-squared error (MSE) of the association parameter α obtained from the joint model and TVCM to the data generated considering different sample sizes (n) and different probability distributions for the random effect bi0

    Fig. 2
    figure 2

    a Mean biomarker trajectory for the different scenarios: linearly decreasing (scenarios 2-6 and 9) and quadratic shape with slight (scenario 7) and gross (scenario 8) misspecifications with respect to the linear trend. b Baseline hazard function for the scenarios 1-8 (Weibull) and 9 (non-monotonic shape)

  2. 2

    The observed marker value yi(t) at time t was obtained as yi(t)=wi(t)+ε, where ε represents a normally distributed measurement error \(\epsilon \sim N(0,\sigma ^{2}_{\epsilon })\), with increasing variability σε(0.1,0.3,0.5), corresponding to a coefficient of variation (CV), defined as the standard deviation of measurement error divided by the mean (e.g.,[22]), of 3.1%,9.4%,15.6% respectively. Regular measures of wi(t) were obtained with increasing frequency, from one measurement per week (t=0,1,…,14) to 4 measurements per week (t=0,0.25,…,14), in order to examine the effect of LOCF in TVCM.

  3. 3

    The survival time \(T^{*}_{i}\) was obtained by a Weibull proportional hazard model: hi(t)=λρtρ−1 exp{αwi(t)}, where ρ=1.4, λ=0.1. The association parameter was set at α(0,0.3,0.6), corresponding to no, moderate and strong association between wi(t) and hi(t), respectively. The survival time was generated by evaluating the inverse of a cumulative hazard (see, [23]). Since this does not lead to a closed form expression, we used the R root finder function uniroot to generate \(T^{*}_{i}\) numerically. To investigate the impact of misspecifying the distribution of the hazard function on the association parameter α (b3), in the scenario 9, the survival times were generated from a non-monotonic baseline hazard function h0(t)=νκtκ−1/(c+tκ), where ν=1, κ=2 and c=10. The shape of this function, along with the Weibull curve previously described, were shown in Fig. 2b.

  4. 4

    The censoring time Ci was generated according to a uniform distribution in (0,14), leading to around 20% of censoring proportion before week 14.

  5. 5

    The observed survival time \(T_{i}=min(T^{*}_{i},C_{i})\) was then calculated.

  6. 6

    Marker values yi(t) with t>Ti were disregarded.

We drew B=1000 simulations for each scenario, B was chosen in order to get at least a 2% level of accuracy in the estimate of the association parameter α in about 95% of the samples, assuming a true association parameter of 0.6 with standard error 0.14 [24]. To each generated dataset, we fitted the following models: i) basic Cox model considering only the baseline measurement of a marker, yi(t=0); ii) the TVCM considering the observed updated value of the marker; iii) the joint model considering the updated value of the marker. We summarized the results using: the mean of the simulation estimates (Est), empirical Monte Carlo standard error (ESE), asymptotic standard error (ASE), percentage bias (%Bias =bias/α) and 95% coverage probabilities (CP) of the association parameter α. We also used bias and the mean-squared error (MSE) as necessary. The ASE was computed as the average of the estimated standard errors, and the ESE as the standard deviation of the estimates of α.

Results

a) Measurement error and last observation carried forward impact

Table 2 shows the results of the constant biomarker case (scenario 1 of Table 1). The TVCM and the baseline Cox model show a very similar performance, with increasing bias as the measurement error is increasing. This is expected given that the biomarker mean value does not change over time. In the presence of small measurement error (σε=0.1), the joint model estimate showed a higher bias, indicating that a joint model is less beneficial in the presence of small measurement error and a constant biomarker. However, when σε was increased to 0.3 and 0.5, the bias in the estimates of the joint model was smaller than the one in the TVCM, suggesting the ability of the joint model to account for measurement error.

Table 2 Results on the association parameter α obtained from the baseline Cox model, the TVCM and the joint model fitted to data generated considering a constant biomarker (scenario 1 of Table 1), α(0,0.3,0.6) and σε(0.1,0.3,0.5) with CV (3.1%,9.4%,15.6%). Mean of the maximum likelihood estimates (Est), empirical Monte Carlo standard error (ESE), asymptotic standard error (ASE), percentage bias (%Bias) and 95% coverage probabilities (CP) are shown

Table 3 shows the results under scenario 2 (linearly decreasing marker), with α(0,0.3,0.6). The ESE (not reported) were always in close agreement with the ASE. When α was set at 0, a similar good performance of the three models was visible regardless of the size of σε. In the other scenarios, we can observe increasing bias, and decreasing coverage probabilities, for the TVCM (every week) as the magnitude of σε increases. With σε=0.1 and α=0.3, the percentage bias was −2.3% and the coverage 95%. This percentage bias raised to −19%, and the coverage dropped to 80%, when σε increased to 0.5, while it reduced to −0.7% when the number of measurements taken was increased to four times per week, thus the impact of LOCF estimate was reduced. The advantage of using the joint model was observed in the presence of high measurement error, where the percentage bias of −19% (TVCM) was reduced to 0.3%. The joint model, fitted using the parametric Weibull baseline hazard, provided the most unbiased estimates with coverage probabilities much closer to 95% across all the scenarios. We note that the performance of the TVCM falls further in the presence of a strong association between the marker and the time-to-event. For instance, with α=0.6 and σε=0.5, a large percentage bias, −21%, and a very small coverage, 35%, were observed for the TVCM (once per week). In the latter setting, the improvement achieved by increasing the number of measurements was small.

Table 3 Results of the association parameter α obtained from the baseline Cox model, the TVCM and the joint model fitted to data generated considering the linear marker trajectory (scenario 2 of Table 1) with α(0,0.3,0.6) and σε(0.1,0.3,0.5) with CV (3.1%,9.4%,15.6%). Mean of the maximum likelihood estimates (Est), asymptotic standard error (ASE), bias, percentage bias (%Bias) and 95% coverage probabilities (CP) are shown

b) Results under model misspecification

b1) Marker distribution

In joint modeling, the marker distribution is typically assumed to be Gaussian (e.g., [16]). Violation of this assumption is a key concern as the random effects play a central role in characterizing the association between the biomarker and hazard of an event [10]. The simulation study in this section assesses the effect of misspecifying the distribution of the random effects according to the scenarios 3 to 6 of Table 1. A random intercept model was considered to generate the random intercept bi0 from three non-normal distributions and a reference Normal distribution. The joint model was fitted assuming a normally distributed random intercept in the longitudinal submodel. Five different sample sizes of 35, 75, 150, 300 and 600 subjects were considered in this setting. The measurement error standard deviation was held fixed σε=0.3 and the true association parameter α=0.3. The results of the simulation are shown in Table 4. The joint model failed to converge for a few simulations with small sample size: 6/1000 when the data were generated using the Bimodal distribution with n=35 and 1/1000 for n=75. These non-converging simulations were excluded from the analyses. When the marker was generated from a non-normal distribution, the joint model produced a biased estimate of α for n=35, with a percentage bias of 22%, 17% and 7.7% when the random intercept was generated from Chisquare, Gamma and Bimodal distributions, respectively. However, the percentage bias decreased as the sample size n increased, reaching a maximum value of 3.7% with n=600 subjects, and the coverage probabilities were closer to the optimal 95% across all distributions. Further, both the ESE and the ASE decreased as the sample size increased. Thus, the estimate of the association between longitudinal marker and hazard of an event is not affected substantially by the misspecification of the random effect distribution as long as the sample size is large.

Table 4 Results of the association parameter α obtained from joint model and TVCM fitted to data generated considering the sample size n(35,75,150,300,600) and different probability distributions (scenarios 3:6 of Table 1) for the random effect bi0 with variance Σ11=1.44, α=0.3 and σε=0.3 with CV =9.4%

The TVCM is relatively less biased and more precise in the estimate of α for small sample sizes, indicating it could provide a good accuracy even though the marker was contaminated with a measurement error (σε=0.3). Figure 1 shows the MSE for the joint and TVCM models under the four distributions. The MSE reflects the accuracy of each model taking into account both the bias and variability [24]. For the small sample size, the TVCM has lower MSE, except for the Normal case where MSE from both models are the same. As the sample size increases, the MSE from both models coincide.

b2) Marker trajectory

In order to appropriately characterize the association between the marker and the hazard of an event, estimation of the subject-specific trajectory wi(t) from (2) must capture the underlying shape. To evaluate the impact of misspecification of the marker profile on the estimate of α, we generated longitudinal trajectories which were quadratic in nature and fitted a joint model assuming linear trajectories with random intercept and random slope. We considered a slight and a gross departure from linearity, with parameters specified in scenarios 7 and 8 of Table 1, respectively. Figure 2a illustrates the mean longitudinal profile under both scenarios.

Table 5 reports the results of the simulation study under marker trajectory misspecification. The table includes the TVCM fitted to the generated observed longitudinal marker based on four times per week. A lack of convergence was encountered for the joint model under gross misspecification: the frequencies of non-convergence were 16/1000 and 13/1000 for σε=0.3 and σε=0.5, respectively. Further, one extreme outlier estimate for each of the two σε values was obtained. The two outliers were excluded from the results shown in Table 5. The impact of the marker trajectory misspecification is clearly observed in the estimates of the joint model. For σε=0.3, we observe a percentage bias of −5.3% for the joint model under slight misspecification. This corresponds to an extra 5% bias as compared with the same scenario when the marker shape was correctly specified (see, Table 3). The extra bias could be as large as 8.7% under gross misspecification. These indicate that the longitudinal trajectory of a marker must be carefully specified when a joint model is considered for estimating the association between longitudinal biomarker and time-to-event. In the event of gross misspecification, the TVCM provides less biased estimates even in the presence of moderate measurement error in the biomarker.

Table 5 Results of the association parameter α estimated from the TVCM and joint model fitted to data generated considering slight and gross misspecifications of the longitudinal trajectories (scenarios 7 and 8 of Table 1), σε(0.1,0.3,0.5) with CV (3.1%,9.4%,15.6%) and the true α=0.3
b3) Hazard shape function

Within the joint model framework, leaving the baseline hazard unspecified severely underestimates the standard errors of the parameter estimates [21]. Thus the hazard function for the survival submodel is often assumed to be Weibull (e.g., [25]), but the evolution of the hazard rate over time can easily be non-monotonic (e.g., [26, 27]). To investigate the impact of misspecifying the distribution of the hazard function on the association parameter α, we generated data following a non-monotonic hazard (scenario 9 in Table 1) and fitted the joint model assuming three baseline hazard shapes: constant, Weibull and splines. For the case of splines, the baseline hazard was defined using B-splines (e.g., [28]) with 5 internal knots placed at equally-spaced percentiles of the observed survival time Ti. Table 6 reports the results considering α(0.3,0.6) and σε(0.1,0.3,0.5). The performance of the TVCM was comparable to the previous scenarios (see Table 3), while the accuracy of the joint model was strictly dependent on the assumptions on the hazard shape. The joint model with constant hazard produced severely biased estimates: for example when σε=0.1, α=0.3 was underestimated by 39%, with a coverage of 39%, and none of the confidence intervals contained the true value, when α was set at 0.6. Thus, even if the constant hazard can be appealing for ease of computation, it often does not represent a realistic assumption. When the joint model was fitted to the generated data assuming a Weibull hazard, the estimate of α was also biased for all the scenarios. For α=0.3 and σε=0.1, α was overestimated by 12%. Joint models based on spline functions provided the most unbiased estimates of α with coverage probability closer to 95% in most scenarios. The flexibility of spline functions allowed to capture the underlying non-linear shape of the baseline hazard.

Table 6 Results of the association parameter α obtained from joint model and TVCM fitted to data generated considering a non-monotonic baseline hazard function (scenario 9 of Table 1), α(0.3,0.6) and σε(0.1,0.3,0.5) with CV (3.1%,9.4%,15.6%)

Motivating context

The example is coming from a study where patients with haemato-oncological diseases who underwent stem cell transplantation (HSCT) were evaluated to explore the potential of the long pentraxin 3 (PTX3) as a prognostic marker of Acute Graft-versus-Host Disease (GvHD) [1]. Acute graft-versus-host disease is one of the major causes of morbidity and mortality associated with allogeneic stem cell transplants [29]. Currently, the diagnosis of GvHD is based on clinical signs and symptoms and requires invasive biopsies of disease target organs in uncertain cases, which are sometimes unfeasible. To improve diagnosis and prognosis of GvHD, recent researches focus on specific biomarkers measured in the plasma or serum of HSCT patients as a new tool for detecting GvHD prior to clinical manifestation and for GvHD management. PTX3 is an acute-phase protein, rapidly produced by vascular endothelial cells, mesenchymal cells and fibroblasts, as well as by innate immune response cells upon stimulation with pro-inflammatory cytokines, damaged tissue-derived signals and microbial antigens. Differently from other acute phase proteins, such as the C-Reactive Protein, PTX3 is considered a rapid marker for primary local activation of innate immunity and inflammation due to its peculiar pattern of production.

In this section, we compare the use of the TVCM and joint model for the evaluation of PTX3 as a marker of GvHD. Peripheral blood samples were collected in a cohort of 116 patients before the beginning of conditioning regimen, on day 0 (HSCT), weekly after HSCT until the 14th week and at the development of symptoms consistent with GvHD. Plasma was obtained after centrifuging whole blood and PTX3 was evaluated by Sandwich ELISA assay, with a measurement precision declared as an intra-assay CV lower than 10%. The median follow-up time was 5 weeks. Time was measured from HSCT up to the occurrence of GvHD, censoring occurred if a subject died before GvHD or was lost to follow-up. The follow-up ended at the 14th week.

Figure 3a displays the distribution of the PTX3 marker over time, showing a decreasing trend and departure of the distribution from normality. The average PTX3 at week 0 for all subjects was 29.46 ng/ml (nanograms per milliliter) with a standard deviation of 31.5. The GvHD hazard was estimated using the bshazard package [30], and plotted in Fig. 3b, which showed a highly non-monotonic shape of the GvHD event. We fitted a TVCM and a joint model to evaluate the association between the marker and the hazard of GvHD. Consistently with the simulation study, we also considered the basic Cox model that uses only the baseline information, observed at t=0, as a covariate. For the joint model the longitudinal PTX3 was specified using a linear mixed model with random intercept and random slope, which was chosen as the best model according to AIC selection criterion when compared to a mixed model that involves a quadratic time. The baseline hazard within the joint model was specified as constant, Weibull and B-splines with 6 internal knots placed at equally-spaced percentiles of the event time. Each model was fitted considering both the original PTX3 and the logarithmic transformation of PTX3 to satisfy the normality assumption of the linear mixed model.

Fig. 3
figure 3

a The distribution of PTX3 marker in time. b The shape of the distribution of the GvHD hazard estimate

The results are shown in Table 7, which reports the estimated association between PTX3 and GvHD (Est), the standard error of the estimate (SE), hazard ratio (HR), and the 95% confidence interval of the HR (95% HR CI). The baseline marker did not showed a significant association with the hazard of GvHD event. The updated values of PTX3 appear to be positively associated with the hazard of the GvHD as estimated by the TVCM, with both its original value and the log transformed version, even though the HR values are not comparable due to the log transformation. The TVCM hazard ratio of 1.14 indicates that a unit increase in the PTX3 marker corresponds to a 1.14-fold increase in the hazard of developing the GvHD disease.

Table 7 Estimates of the association of PTX3, and log(PTX3), with time to GvHD from the baseline Cox model, TVCM and joint model

The joint models using constant and Weibull hazards estimated a lower non-significant association between PTX3 and time to GvHD. Interestingly, when the hazard was modeled by splines, the HR point estimate was equal to the one obtained by the TVCM (1.14), but with higher variability. When the log of PTX3 was used in a joint model with spline baseline hazard, a HR(95%CI) of 3.11(1.05,9.18) was obtained. It follows that a unit increase in the log of PTX3 marker was associated to a 3.11-fold increase in the risk of developing the GvHD disease. This value was greater than the HR of 1.82 estimated by the TVCM, but with higher variability.

Overall, we notice a great variability among the joint model estimates of the HR, ranging from 0.76 up to 3.11. This can be directly linked to the misspecification of the marker and hazard distribution in some of the applied models, coherent with the simulation results. The Cox model was unaffected by the normality of the marker and from the hazard distribution.

Figure 4 shows the Kaplan-Meier (KM) estimate of GvHD occurrence and the predicted marginal survival from each one of the applied joint models. The spline based survival curve was much closer to the KM curve, suggesting that splines were able to capture the strong non-linear hazard function shown in Fig. 3b. The curve associated to the Weibull was in agreement with the KM estimate until the 4th week of follow-up, but the difference with the KM estimate increased over time. As expected, the survival curve associated to the constant hazard largely deviated from the KM curve.

Fig. 4
figure 4

Observed Kaplan-Meier (KM) curve and predicted survival curves from the joint model assuming constant, Weibull and spline based hazards. A logarithmic transformation of PTX3 was used in the joint models

Discussion

Investigating biological biomarkers as markers of diagnosis/prognosis or response to treatment requires inferential tools for the association between the longitudinal process of the marker and the progression of the diseases. The TVCM has been the standard approach, but its partial likelihood assumes constant biomarker values between follow-up times and ignores measurement error. There has been some effort to expand the Cox model to accommodate measurement error, such as regression calibration (e.g., [33]), that however requires the availability of a validation subsample, that is not often available. The modeling of the longitudinal profile of the biomarker by a linear mixed model is another approach to obtain an estimate of the expected value of the biomarker free from measurement error, that can be included as a covariate in the TVCM with a two-stage approach [17]. Joint models simultaneously analyze the longitudinal marker profile and the time to an event overcoming both the issues of LOCF and measurement error. Joint models are, however, computationally intensive and require additional assumptions over the TVCM. In this paper, we performed a comprehensive simulation study with the goal of clarifying relevant assumptions for the understanding of a joint model and for assessing its robustness under key model misspecifications. Further, we disentangled the bias introduced by LOCF and measurement error in the TVCM and compared its performance with the joint model. Overall, we illustrated that the TVCM approach underestimates the association estimates in the presence of measurement error.The major source of the TVCM bias was attributable to measurement error compared with that attributable to LOCF. On the other hand, the joint model can be severely biased under model misspecification.

Firstly we considered how estimates from a joint model may be biased under the misspecification of the normality assumption for the true marker distribution. Violation of this assumption for joint models is an issue as the random effects play a central role for characterizing the association between the marker and the hazard of an event [10]. To avoid parametric distributional assumption, joint models based on a semi-parametric [31] or non-parametric assumptions [5] have been proposed. Further, [32] showed that parameter estimates are robust to misspecification as the number of measurements per subject increases. We showed that the misspecification has a negligible effect on the estimate of the association parameter as long as the sample size is large, regardless of the parametric distribution being adopted. The TVCM was not affected by the marker distribution. This is expected, but it is worth to underline it here to discourage unnecessary log-transformation to account for normality in the Cox model framework, which is sometimes seen in the medical literature (e.g., [34]).

Second, we looked into the impact of misspecifying the longitudinal marker trajectory on the estimate of the association between the marker and the hazard of an event. This is motivated by the fact that the true underlying marker trajectory is typically unknown, since we only observe error contaminated and intermittently measured marker. To effectively characterize the association estimate, the true marker trajectory must be appropriately estimated [10]. We illustrated that failing to capture the underlying marker trajectory, at different amounts of measurement error, leads to substantially biased estimates in the joint model, while the TVCM is unaffected by the misspecification, since it does not assume any form of marker shape. [17] similarly found that, at fixed measurement error, estimates from the joint model are biased under marker trajectory misspecification. However, they also suggested that the bias is still less than the bias from the TVCM.

Furthermore, we found that a misspecification of the baseline hazard in the joint modeling framework has an important effect on the estimate of the association between the longitudinal marker and the hazard of an event. This issue had never been considered in the literature of joint models, but simulations indicated that the association estimate was severely biased when the data generating hazard process was misspecified. This was particularly evident when we attempted to model a highly non-linear hazard shape by a constant or Weibull hazard. On the other hand, association estimate using TVCM was insensitive to the misspecification of the baseline hazard, as its shape is unspecified. In the joint modeling framework leaving the baseline hazard unspecified severely underestimates the standard error of the parameters [21], even if it appears to be the most applied choice as shown in a recent meta-analysis on joint models [25]. Thus, the baseline hazard in the joint model should be carefully modeled, also with the use of splines if necessary, to avoid bias on the association estimate. The two modeling techniques were illustrated using a real data on HSCT for establishing PTX3 as a marker of GvHD. The joint model, with the hazard modeled by spline functions, provided the PTX3 as a potential diagnostic marker of GvHD. This was corroborated by the TVCM, even if it indicated a lower association estimate.

In conclusion, joint models are a powerful tool, able to account for marker measurement error and to model the marker trajectory in time. However, they require strong assumptions that need to be properly validated, and the avoidance of bias due to model misspecification is crucial in order for a joint model to provide a substantive benefit over the semi-parametric Cox model with a time-varying covariate. Furthermore, it may be suggested that the better performance by the joint model is unfair because the data generating scheme in our simulation utilized a biomarker measurement error whereas the TVCM does not assume the presence of measurement error. We showed that the performance of the joint model was higher than that of a TVCM accounting for measurement error in the biomarker by a two-stage approach, while requiring similar hypotheses. The results are provided in the Additional file 1.

Availability of data and materials

The datasets along with the simulation code used during the current study are available from the corresponding author on reasonable request.

Abbreviations

ASE:

Asymptotic standard error

CI:

Confidence interval

CP:

Coverage probabilities

ESE:

Empirical monte carlo standard error

Est:

Mean of the maximum likelihood estimates

GvHD:

Acute graft-versus-host disease

HR:

Hazard ratio

HSCT:

Haemato-oncological stem cell transplantation

KM:

Kaplan-meier

LOCF:

Last observation carried forward

PTX3:

Long Pentraxin 3

TVCM:

Time-varying covariate cox model

References

  1. Dander E, De Lorenzo P, Bottazzi B, Quarello P, Vinci P, Balduzzi A, Masciocchi F, Bonanomi S, Cappuzzello C, Prunotto G, et al. Pentraxin 3 plasma levels at graft-versus-host disease onset predict disease severity and response to therapy in children given haematopoietic stem cell transplantation. Oncotarget. 2016; 7:82123–38.

    Article  Google Scholar 

  2. Cox DR. Regression models and life tables. J Royal Stat Soc. 1972; 34:187–220.

    Google Scholar 

  3. Kalbfleisch JD, Prentice RL. The statistical analysis of failure time data. New Jersey: Wiley; 2012.

    Google Scholar 

  4. Andersen PK, Gill RD, Borgan Ø, Keiding N. Statistical models based on counting processes. New York: Springer; 1993.

    Book  Google Scholar 

  5. Tsiatis AA, Davidian MA. A semiparametric estimator for the proportional hazards model with longitudinal covariates measured with error. Biometrika. 2001; 88:447–58.

    Article  Google Scholar 

  6. Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982; 69:331–42.

    Article  Google Scholar 

  7. Dupuy JF, Mesbah M. Joint modeling of event time and nonignorable missing longitudinal data. Light Data Anal. 2002; 8:99–115.

    Article  Google Scholar 

  8. Tsiatis AA, Davidian MA. Joint modeling of longitudinal and time-to-event data: an overview. Stat Sin. 2004; 14:809–34.

    Google Scholar 

  9. Proust-Lima C, Sene M, Taylor JM, Jacqmin-Gadda H. Joint latent class models for longitudinal and time-to-event data: A review. Stat Methods Med Res. 2014; 23:74–90.

    Article  Google Scholar 

  10. Rizopoulos D. Joint models for longitudinal and time-to-event data: With applications in R. Boca Raton: Chapman and Hall/CRC; 2012.

    Book  Google Scholar 

  11. Ibrahim JG, Chu H, Chen LM. Basic concepts and methods for joint models of longitudinal and survival data. J Clin Oncol. 2010; 28:2796–801.

    Article  Google Scholar 

  12. Gould AL, Boye ME, Crowther MJ, Ibrahim JG, Quartey G, Micallef S, Bois FY. Responses to discussants of ’Joint modeling of survival and longitudinal non-survival data: current methods and issues. report of the DIA Bayesian joint modeling working group. Stat Med. 2015; 34:2202.

    Article  Google Scholar 

  13. Köhler M, Umlauf N, Beyerlein A, Winkler C, Ziegler AG, Greven S. Flexible Bayesian additive joint models with an application to type 1 diabetes research. Biom J. 2017; 59:1144–65.

    Article  Google Scholar 

  14. Sauerbrei W, Abrahamowicz M, Altman DG, Cessie S, Carpenter J. STRengthening analytical thinking for observational studies: the STRATOS initiative. Stat Med. 2014; 33:5413–32.

    Article  Google Scholar 

  15. Boulesteix AL, Binder H, Abrahamowicz M, Sauerbrei W. Simulation Panel of the STRATOS Initiative. Biom J. 2008; 60:216–8.

    Article  Google Scholar 

  16. Crowther MJ, Lambert PC, Abrams KR. Adjusting for measurement error in baseline prognostic biomarkers included in a time-to-event analysis: a joint modelling approach. Med Res Methodol. 2013; 13:146–54.

    Article  Google Scholar 

  17. Sweeting MJ, Thompson SG. Joint modelling of longitudinal and time-to-event data with application to predicting abdominal aortic aneurysm growth and rupture. Biom J. 2011; 53:750–63.

    Article  Google Scholar 

  18. Marubini E, Valsecchi MG. Analysing Survival Data from Clinical Trials and Observational Studies. Wiley: West Sussex; 1996.

    Google Scholar 

  19. Henderson R, Diggle P, Dobson A. Joint modelling of longitudinal measurements and event time data. Biostatistics. 2000; 1:465–80.

    Article  CAS  Google Scholar 

  20. Verbeke G, Molenberghs G. Linear mixed models for longitudinal data. New York: Springer; 2000.

    Google Scholar 

  21. Hsieh F, Tseng Y, Wang J. Joint modeling of survival and longitudinal data: likelihood approach revisited. Biometrics. 2006; 62:1037–43.

    Article  Google Scholar 

  22. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med. 1998; 26:217–38.

    Article  CAS  Google Scholar 

  23. Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat Med. 2005; 24:1713–23.

    Article  Google Scholar 

  24. Burton A, Altman DG, Royston P, Holder RL. The design of simulation studies in medical statistics. Stat Med. 2006; 25:4279–92.

    Article  Google Scholar 

  25. Sudell M, Kolamunnage-Dona R, Tudur-Smith C. Joint models for longitudinal and time-to-event data: a review of reporting quality with a view to meta-analysis. BMC Med Res Methodol. 2016; 16:168.

    Article  Google Scholar 

  26. Rebora P, Galimberti S, Valsecchi MG. Using multiple timescale models for the evaluation of a time-dependent treatment. Stat Med. 2015; 34:3648–60.

    Article  Google Scholar 

  27. Lee M, Czene K, Rebora P, Reilly M. Patterns of changing cancer risks with time since diagnosis of a sibling. Int J Cancer. 2015; 136:1948–56.

    Article  CAS  Google Scholar 

  28. Arisido MW. Functional measure of ozone exposure to model short-term health effects. Environmetrics. 2016; 27:306–17.

    Article  CAS  Google Scholar 

  29. Ferrara JLM, Levine JE, Reddy P, Holler E. Graft-versus-host disease. The Lancet. 2009; 373:1550–61.

    Article  CAS  Google Scholar 

  30. Rebora P, Salim A, Reilly M. A flexible tool for nonparametric smoothing of the hazard function. The R J. 2014; 6/2:114–22.

    Article  Google Scholar 

  31. Song X, Davidian M, Tsiatis AA. A semiparametric likelihood approach to joint modeling of longitudinal and time-to-event data. Biometrics. 2002; 58:742–53.

    Article  Google Scholar 

  32. Rizopoulos D, Verbeke G, Molenberghs G. Shared parameter models under random effects misspecification. Biometrika. 2008; 95:63–74.

    Article  Google Scholar 

  33. Prentice RL. Covariate measurement errors and parameter estimation in a failure time regression model. Biometrika. 1982; 69:331–42.

    Article  Google Scholar 

  34. de Vries EM, Wang J, Williamson KD, Leeflang MM, Boonstra K, Weersma RK, et al. A novel prognostic model for transplant-free survival in primary sclerosing cholangitis. Gut. 2017. https://0-doi-org.brum.beds.ac.uk/10.1136/gutjnl-2016-313681.

    Article  Google Scholar 

Download references

Acknowledgements

Not applicable.

Funding

This work was supported by the grant SIR RBSI14LOVD of the Italian Ministry of Education, University and Research. We acknowledge that this research was partially supported by the Italian Ministry of University and Research (MIUR) - Department of Excellence project PREMIA (PREcision MedIcine Approach: bringing biomarker research to clinic). The sponsor has no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

MW conducted the simulations, analyzed the application data and was a major contributor in writing the manuscript. PR supervised and coordinated the research work, designed the simulation protocol and data analysis, interpreted the results and was a major contributor in writing the manuscript. LA contributed to describing the methods and interpreting the result. DPB was involved in interpreting the result and discussing the research work. MG evaluated the research work and revised the writing of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Paola Rebora.

Ethics declarations

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1

Supplementary Appendix.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arisido, M.W., Antolini, L., Bernasconi, D.P. et al. Joint model robustness compared with the time-varying covariate Cox model to evaluate the association between a longitudinal marker and a time-to-event endpoint. BMC Med Res Methodol 19, 222 (2019). https://0-doi-org.brum.beds.ac.uk/10.1186/s12874-019-0873-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://0-doi-org.brum.beds.ac.uk/10.1186/s12874-019-0873-y

Keywords