 Research article
 Open Access
 Open Peer Review
 Published:
Bias and precision of methods for estimating the difference in restricted mean survival time from an individual patient data metaanalysis
BMC Medical Research Methodology volume 16, Article number: 37 (2016)
The Erratum to this article has been published in BMC Medical Research Methodology 2016 16:71
Abstract
Background
The difference in restricted mean survival time (\( rmstD\left({t}^{\ast}\right) \)), the area between two survival curves up to time horizon \( {t}^{\ast } \), is often used in costeffectiveness analyses to estimate the treatment effect in randomized controlled trials. A challenge in individual patient data (IPD) metaanalyses is to account for the trial effect. We aimed at comparing different methods to estimate the \( rmstD\left({t}^{\ast}\right) \) from an IPD metaanalysis.
Methods
We compared four methods: the area between KaplanMeier curves (experimental vs. control arm) ignoring the trial effect (Naïve KaplanMeier); the area between Peto curves computed at quintiles of event times (Petoquintile); the weighted average of the areas between either trialspecific KaplanMeier curves (Pooled KaplanMeier) or trialspecific exponential curves (Pooled Exponential). In a simulation study, we varied the betweentrial heterogeneity for the baseline hazard and for the treatment effect (possibly correlated), the overall treatment effect, the time horizon \( {t}^{\ast } \), the number of trials and of patients, the use of fixed or DerSimonianLaird random effects model, and the proportionality of hazards. We compared the methods in terms of bias, empirical and average standard errors. We used IPD from the MetaAnalysis of Chemotherapy in Nasopharynx Carcinoma (MACNPC) and its updated version MACNPC2 for illustration that included respectively 1,975 and 5,028 patients in 11 and 23 comparisons.
Results
The Naïve KaplanMeier method was unbiased, whereas the Pooled Exponential and, to a much lesser extent, the Pooled KaplanMeier methods showed a bias with nonproportional hazards. The Petoquintile method underestimated the \( rmstD\left({t}^{\ast}\right) \), except with nonproportional hazards at \( {t}^{\ast } \)= 5 years. In the presence of treatment effect heterogeneity, all methods except the Pooled KaplanMeier and the Pooled Exponential with DerSimonianLaird random effects underestimated the standard error of the \( rmstD\left({t}^{\ast}\right) \). Overall, the Pooled KaplanMeier method with DerSimonianLaird random effects formed the best compromise in terms of bias and variance. The \( rmstD\left({t}^{\ast },=,10,\kern0.5em ,\mathrm{years}\right) \) estimated with the Pooled KaplanMeier method was 0.49 years (95 % CI: [−0.06;1.03], p = 0.08) when comparing radiotherapy plus chemotherapy vs. radiotherapy alone in the MACNPC and 0.59 years (95 % CI: [0.34;0.84], p < 0.0001) in the MACNPC2.
Conclusions
We recommend the Pooled KaplanMeier method with DerSimonianLaird random effects to estimate the difference in restricted mean survival time from an individualpatient data metaanalysis.
Background
In costeffectiveness analysis, a commonly used survival measure is the restricted mean survival time (RMST). It estimates the life expectancy for one treatment arm up to a certain time horizon \( {t}^{\ast } \) [1–4]. The difference in restricted mean survival time (\( rmstD\left({t}^{\ast}\right) \)) can thus quantify the treatment effect expressed in terms of life years gained. The \( rmstD\left({t}^{\ast}\right) \) is an appealing outcome measure as it is valid even in case of nonproportional hazards [5]. Moreover, as an absolute outcome, the interpretation of the \( rmstD\left({t}^{\ast}\right) \) is considered more intuitive from the clinician point of view than the hazard ratio (HR) which is a relative measure of the treatment effect [3, 5, 6]. Recent studies have compared methods to estimate the RMST including extrapolation beyond the trial followup [7–10]. However, these studies focused on the use of a single randomized clinical trial and not specifically on multicenter clinical trials nor metaanalyses.
In metaanalyses or in multicenter clinical trials, there is a need to take into account the trial or center effect to avoid the Simpson’s paradox that may lead to biased estimates [11–13]. Different authors have discussed the use of Cox models with either stratification or fixed effect, or random effects to account for the center effect in a multicenter clinical trial [14–16] or the trial effect in a metaanalysis [17–20] in presence of baseline hazard heterogeneity and/or treatment effect heterogeneity between centers or trials. Several papers have also compared onestage or twostage methods to estimate the hazard ratio in individual patient data (IPD) metaanalyses [20–22]. All these studies focused on the estimation of the treatment effect through the use of the hazard ratio, but so far only one has focused on the use of the \( rmstD\left({t}^{\ast}\right) \) in IPD metaanalyses [6]. In this latter study, Wei and colleagues investigated three twostage methods to estimate the \( rmstD\left({t}^{\ast}\right) \) from an IPD metaanalysis: two nonparametric methods – one based on pseudovalues [23] and one based on the KaplanMeier estimate – and a flexible parametric survival model [24]. In their study, the \( rmstD\left({t}^{\ast}\right) \) was estimated as an aggregation of the \( rmstD\left({t}^{\ast}\right) \) estimated in each trial using a fixed effect metaanalysis model. The authors showed via simulations and two case studies that the three methods produced similar results in terms of bias and coverage probability of the confidence intervals.
In the present paper, we aimed at extending the first study from Wei et al. [6] by comparing more methods for estimating the \( rmstD\left({t}^{\ast}\right) \) from an IPD metaanalysis in a broader range of scenarios. We also designed a more realistic simulation study with random effects to induce betweentrial heterogeneity, both in terms of baseline hazard and of treatment effect. We considered only one of the nonparametric methods studied by Wei and colleagues – the one pooling KaplanMeier estimates – as they found similar results for the three methods. We further considered a parametric method – pooling exponential estimates – and two other nonparametric methods: one naïve method that does not account for trial effect and an actuarial survival method developed by Peto, classically used in IPD metaanalyses for computing survival curves [25–27]. In simulations, we varied not only the treatment effect size and the time horizon \( {t}^{\ast } \), as previously done by Wei and colleagues, but also the baseline hazard heterogeneity, the treatment effect heterogeneity, the correlation between these two random effects, the number of trials and the number of patients per trial, the use of fixed effect and DerSimonianLaird random effects model [28], and departure from the assumption of proportional hazards.
In the ‘Methods’ section we describe the \( rmstD\left({t}^{\ast}\right) \) and the different survival analysis methods for estimating the \( rmstD\left({t}^{\ast}\right) \) that we investigate in this paper. Section ‘Simulation study’ describes the design of the simulation study, how to estimate the true \( rmstD\left({t}^{\ast}\right) \), the simulation scenarios and the evaluation criteria, and presents the simulation results. Section ‘Application’ provides two examples using IPD metaanalyses in nasopharynx carcinoma. We end with a discussion and our conclusion regarding the choice of a particular method to estimate the \( rmstD\left({t}^{\ast}\right) \) from an IPD metaanalysis. Of note, the investigated methods can also be used for the estimation of the \( rmstD\left({t}^{\ast}\right) \) in multicenter clinical trials.
Methods
Difference in restricted mean survival time
Let T be the survival time random variable with distribution F(t). The mean survival time restricted at a specified time \( {t}^{\ast } \) is defined as
where S(t) = 1 – F(t) is the survival function. The RMST \( {t}^{\ast } \) corresponds graphically to the area under the survival curve S(t) up to \( {t}^{\ast } \). The difference in restricted mean survival time (\( rmstD\left({t}^{\ast}\right) \)) between the experimental arm and the control arm (noted 1 and 0) is defined as
The variance \( \widehat{Var}\left( rmstD\left({t}^{\ast}\right)\right) \) can be estimated as [29]:
As opposed to the relative hazard ratio, the \( rmstD\left({t}^{\ast}\right) \) is an absolute outcome which depends both on the baseline hazard and on the relative treatment effect.
Wei et al. [6] also proposed the use of the relative difference in restricted mean survival time defined as
The \( rmstD\left({t}^{\ast}\right) \) varies between 0 and 1 and can be interpreted as a percentage. Its variance can be estimated as \( \widehat{Var}\left[ rmstD\left({t}^{\ast}\right)\right]/{\left({t}^{\ast}\right)}^2 \).
Methods for estimating the difference in restricted mean survival time
We investigated four methods for estimating the difference in restricted mean survival time \( rmstD\left({t}^{\ast}\right) \) from an IPD metaanalysis: 1) the Naïve KaplanMeier, which pools the data, ignoring the trial effect, 23) the Pooled KaplanMeier and Pooled Exponential methods, which use a twostage approach to combine \( {rmstD}_j\left({t}^{\ast}\right) \) estimated in each trial j, and 4) the Petoquintile method, which uses survival functions derived from a pooled hazard ratio estimated with a twostage approach in order to take into account the trial effect.
Naïve KaplanMeier
The most basic method to estimate the \( rmstD\left({t}^{\ast}\right) \) is to consider the IPD metaanalysis as a single large trial. Under this approach, the \( rmstD\left({t}^{\ast}\right) \) is estimated as the area between the KaplanMeier curves of the experimental and the control arm, obtained by pooling the data from all the trials, thus ignoring the trial effect [30]:
with Ŝ _{ arm }(t _{ arm,0}) = 0, t _{ arm,0} = 0, D _{ arm } the number of distinct event times (t _{ arm,1 } < t _{ arm,2 } < … < t _{ arm,D }) and Ŝ _{ arm }(t) the KaplanMeier estimator for the experimental arm and the control arm (noted 1 and 0). The variance of the \( rmstD\left({t}^{\ast}\right) \) was estimated analytically by the delta method for the Naïve KaplanMeier (details are provided in Additional file 1).
Pooled KaplanMeier and Pooled Exponential
In order to take into account the trial effect, a different strategy consists in estimating the \( {rmstD}_j\left({t}^{\ast}\right) \) within each trial j and then to combine the trialspecific results into a pooled estimate. In the Pooled KaplanMeier and Pooled Exponential methods, which are both twostage approaches, we first estimated the \( {rmstD}_j\left({t}^{\ast}\right) \) in each trial j as the area between either trialspecific KaplanMeier curves
or trialspecific exponential curves
with for each trial j: Ŝ _{ j,arm }(t _{ j,arm,0}) = 0, t _{ j,arm,0} = 0, D _{ j,arm } the number of distinct event times (t _{ j,arm,1 } < t _{ j,arm,2 } < … < t _{ j,arm,D }), Ŝ _{ j,arm }(t) the KaplanMeier estimator, and \( {\widehat{\lambda}}_{j, arm} \) the maximum likelihood estimate of the scale parameter for the exponential distribution for the experimental arm and the control arm (noted 1 and 0).
Then, we combined the \( {rmstD}_j\left({t}^{\ast}\right) \) by using a fixed effect or a DerSimonianLaird random effects [28] metaanalysis model (see Additional file 1). The variance of each \( {rmstD}_j\left({t}^{\ast}\right) \) was estimated analytically by the delta method for the Pooled KaplanMeier and Pooled Exponential methods (details are provided in Additional file 1).
Petoquintile
The actuarial method was developed by Peto to compute the survival curves taking into account the trial effect in IPD metaanalyses [25–27, 31]. In this case, the survival probabilities in the two arms are estimated at the end of predetermined time intervals i based on the estimated survival probability p _{ i } in the overall population and a pooled hazard ratio HR _{ i,pooled }. The survival probability for the overall population is estimated as
where D _{ i } is the number of deaths during interval i and PI _{ i } the total number of personintervals in the ith interval. One personinterval is equivalent to one personyear when the time interval chosen is 1 year. The pooled hazard ratio in the interval, \( {\widehat{HR}}_{i, pooled}, \) is estimated using a fixed effect metaanalysis model to aggregate the different \( {\widehat{HR}}_{i,j} \) estimated in each trial j. The survival probabilities at the end of each interval i in the control arm (p _{ 0,i }) and in the experimental arm (p _{ 1,i }) are estimated as follows:
The survival probability at time t in each arm is the product of the survival probabilities across n _{i} intervals up to t
In the present work, we extended this method by also pooling the \( {\widehat{HR}}_{i,j} \) using a DerSimonianLaird random effects metaanalysis model (see Additional file 1). Furthermore, we used time intervals i based on the quintiles of the overall number of deaths occurring before \( {t}^{\ast } \) in the metaanalysis and therefore we called this method the Petoquintile method. The \( rmstD\left({t}^{\ast}\right) \) was then estimated as the area between the two survival curves defined by Ŝ _{ Peto,0}(t) and Ŝ _{ Peto,1}(t).
where t _{ 0 } = 0 and \( \left({t}_1,..,\kern0.2em {t}_5={t}^{*}\right) \) denotes the time intervals based on the quintiles of events.
A 50replicate nonparametric bootstrap was used to estimate the variance of the \( rmstD\left({t}^{\ast}\right) \) for the Petoquintile method.
Followup differences across trials and extrapolation
We used the extrapolation method proposed by Brown et al. [32] for Naïve KaplanMeier and Pooled KaplanMeier, to extrapolate the survival function beyond the last observed event time (t _{ max }) until \( {t}^{\ast } \), if needed (e.g. in case of potential followup difference across trials) [10, 32]. The Brown extrapolation method consists in completing the tail of the KaplanMeier survival curve by an exponential curve. The estimated survival function for t > t _{ max } is:
where Ŝ (t) is the KaplanMeier estimator of the survival function.
No extrapolation was needed for Pooled Exponential as a parametric model is used. Concerning the Petoquintile method, at least one event is needed overall in the metaanalysis in each arm and at each time interval i for survival probabilities p _{ 0,i } and p _{ 1,i } to be computed. This was always the case, even in case of a potential difference in followup across trials, as each time interval contained by definition one fifth of the deaths occurring before \( {t}^{\ast } \). It is worth noting that if \( {t}^{\ast } \) is greater than t _{ max } the last observed event in the whole metaanalysis, the method estimates the event rate p _{ i } and the pooled hazard ratio HR _{ i,pooled } until t _{ max } and implicitly extrapolates the survival in the remaining interval \( \left[{t}_{max},{t}^{*}\right] \).
Simulation study
Design of the simulation study
We simulated survival times of N patients from J trials, each of size n _{ j } with ∑ _{ j = 1} ^{J} n _{ j } = N. We defined the hazard function for a trial j ϵ {1 , …, J} as
where λ _{ 0 } (t) is the baseline hazard function, A _{ j } is a trialspecific random quantity affecting the baseline hazard with Var(A _{ j }) = σ ^{2}, and B _{ j } is a trialspecific random quantity affecting the treatment effect with Var(B _{ j }) = τ ^{2}, β is the overall treatment effect, and x is the binary treatment variable, coded +1/2 for experimental arm, −1/2 for control in order to obtain equal heterogeneity in both arms [19, 21, 22]. Of note, the covariance between the two random effects is defined by cov(A _{ j },B _{ j }) = ρστ with ρ the correlation between A _{ j } and B _{ j }.
We used an exponential distribution for the baseline hazard function λ_{0}(t) = (log(2)/5)t, corresponding to a median survival time of 5 years. Independent and noninformative rightcensoring was induced by setting the recruitment time at 3 years for all trials and varying the maximum followup time uniformly between 2 and 9 years across trials to replicate the typical difference in observed followup between trials included an IPD metaanalysis.
We induced betweentrial heterogeneity by generating random values a _{ j } and b _{ j } from binomial distributions for the baseline hazard and the treatment effect. The use of a discrete distribution allowed us to derive straightforwardly the true difference in restricted mean survival time (\( rmstD\left({t}^{\ast}\right) \)). The binomial random variables were centered and properly rescaled in order to obtain the desired variances σ ^{2} and τ ^{2}:
The rationale for the arbitrary choice of n = 50 was that the distribution approximated well a continuous distribution, while allowing easy computation of the true \( rmstD\left({t}^{\ast}\right) \).
True difference in restricted mean survival time
Based on our simulation model, the difference in restricted mean survival time is defined as
where \( \mathcal{K}={\left\{\left({a}_k,{b}_k\right)\right\}}_{k=1,\dots,\ K} \) is the support of the bivariate variable (A,B). The joint distribution F _{ A,B } (a,b) is defined by the probabilities p _{ k } = ℙ(A = a _{ k }, B = b _{ k } )_{ k = 1,…, K } of all the K admissible couples of values (a _{ k },b _{ k }). Thanks to the use of a discrete joint probability distribution F _{ A,B } (a,b), the integral in equation (18) boils down to a sum over the K points belonging to its support \( \mathcal{K} \):
with the conditional restricted mean survival time \( {rmstD}_k\left({t}^{\ast}\right) \) defined for a couple (a _{ k,} b _{ k }) as:
Simulation scenarios
In different scenarios we varied the strength of betweentrial heterogeneity for the baseline hazard (low with σ ^{2} = 0.01 and high with σ ^{2} = 0.10) and for the treatment effect (τ ^{2} = 0.01, 0.10). We performed the main analysis with uncorrelated random effects (ρ = 0), however, in a sensitivity analysis, we studied the impact of a negative correlation between A _{ j } and B _{ j } (ρ = −0.8). We considered different values for the number of trials and patients per trial: (J = 5, n _{ j } = 200) and (J = 20, n _{ j } = 100) and for the size of the overall treatment effect (β = 0, ±0.2, ±0.7). We also studied the impact of the time horizon of restriction at \( {t}^{\ast } \)= 5 years and \( {t}^{\ast } \)= 10 years. These two values were chosen to illustrate a scenario in which all trials still have patients at risk at \( {t}^{\ast } \) (5 years) and a scenario in which some trials’ followup are shorter than the time of restriction (\( {t}^{\ast } \)= 10 years). The average administrative censoring rate ranged across scenarios from 49 to 52 % at \( {t}^{\ast } \)= 5 years and from 38 to 40 % at \( {t}^{\ast } \)= 10 years. In the case of no overall treatment effect at all (β = 0, τ ^{2} = 0) and no baseline heterogeneity (σ ^{2} = 0), the restricted mean survival time was equal to 3.6 years at \( {t}^{\ast } \)= 5 years and 5.4 years at \( {t}^{\ast } \)= 10 years in both arms. The influence of nonproportional hazards was examined using a piecewise exponential distribution with a deleterious treatment effect (β’ = −β, with β ≤ 0) in the first 2 years and a beneficial treatment effect (β) afterwards.
Evaluation criteria
We simulated 1,000 metaanalyses for each scenario and compared the four methods using: the average bias, defined as the average of the estimated \( rmstD\left({t}^{\ast}\right) \) minus the true value; the empirical standard error (ESE), defined as the standard deviation of the \( rmstD\left({t}^{\ast}\right) \) over the replicates; and the average standard error (ASE), defined as the average of the estimated standard errors [33].
With the exception of the Naïve KaplanMeier method, the methods were not available in standard statistical software. We implemented the methods and performed the simulation study using R version 3.1.3 (R Foundation, Vienna, Austria). The R code is available from the authors upon request.
Results
For all scenarios, there was almost no bias in the case of no treatment effect (β = 0). When there was a beneficial treatment effect (β = −0.2 or −0.7), Petoquintile underestimated the \( rmstD\left({t}^{\ast}\right) \), notably on the long term (\( {t}^{\ast } \)= 10 years). The Pooled Exponential and, to a much lesser extent, the Pooled KaplanMeier methods showed a bias in the case of nonproportional hazards. In all these cases, whenever a method showed a bias, the bias increased with β (Table 1 and Fig. 1).
In scenarios with higher treatment effect heterogeneity (τ ^{2} = 0.10), all the methods had higher empirical standard error (ESE), as shown in the Figs. 1 and 2, and Tables 1 and 2. The standard error was estimated correctly (ASE = ESE) only with Pooled KaplanMeier and Pooled Exponential. It was generally underestimated (ASE < ESE) with the two other methods: the ASE was twofold smaller than the ESE for the Naïve KaplanMeier and the Petoquintile methods with τ ^{2} = 0.10. When varying the baseline hazard heterogeneity between trials, no relevant impact was noted neither on the bias nor on the standard error.
With both proportional and nonproportional hazards, for β = −0.7, the Petoquintile method showed a bias which was negligible at \( {t}^{\ast } \)= 5 years but much higher at \( {t}^{\ast } \)= 10 years (up to 0.21 years; Fig. 1b and Fig. 2b). In the case of nonproportional hazards, which were incorporated using a piecewise exponential distribution with a deleterious treatment effect in the first 2 years and a beneficial treatment effect afterwards, Pooled Exponential was heavily biased at 5 years, with a bias of almost 0.40 years as compared to a true \( rmstD\left({t}^{\ast },=,5\right) \) = −0.30 years (Fig. 2a and Table 2). This bias suggests that the Pooled Exponential method failed to reflect the piecewise exponential distribution with β’ = −β (β ≤ 0) for t ϵ [0;2] years and β for t > 2 years. However, this bias disappeared at 10 years, arguably because the true hazards were proportional between 2 and 10 years in our simulation setup (Fig. 2b) and the different effect in the first 2 years was thus attenuated. A small bias also arose for Pooled KaplanMeier at 10 years when the hazards were not proportional: a bias of around 0.05 years as compared to a true \( rmstD\left({t}^{\ast },=,10\right) \) = 0.30 years (Fig. 2b).
In terms of standard error, lower values were found for both ESE and ASE at \( {t}^{\ast } \) = 5 years (Fig. 1a and Fig. 2a) as compared at \( {t}^{\ast } \)= 10 years (Fig. 1b and Fig. 2b), no matter the hazards were proportional or not.
The number of trials and the size of trials had no major impact in terms of bias. In metaanalyses of J = 20 trials and n _{ j } = 100 patients per trial, all the methods had lower empirical and average standard errors than in metaanalyses of J = 5 trials and n _{ j } = 200 patients per trial (see Additional file 1: Tables S1, 2, 3 and 4).
We also considered a deleterious treatment effect (β = +0.2, +0.7) but, as expected, results were not affected: the biased methods had biases that were reversed, and ASE and ESE remained unchanged (Additional file 1: Table S5).
The introduction of a negative correlation between A _{ j } and B _{ j } also had no major impact in terms of bias and standard error estimation, with the exception of the scenario with high baseline hazard and treatment variances (σ ^{2} = τ ^{2} = 0.10) for which ASE and ESE of all the methods were higher than with no correlation, notably for β = −0.7 (Additional file 1: Table S6).
When using a fixed effect metaanalysis model (Additional file 1: Table S7), for scenarios with high treatment effect heterogeneity (τ ^{2} = 0.10), the Pooled KaplanMeier, Pooled Exponential and Petoquintile methods exhibited a larger bias as compared with a DerSimonianLaird random effects models used in Table 1. Furthermore, using a fixed effect model underestimated the standard error in general (ASE < <ESE).
Application
We illustrate the four methods for estimating the \( rmstD\left({t}^{\ast}\right) \) using IPD from the MetaAnalysis of Chemotherapy in Nasopharynx Carcinoma (MACNPC) Collaborative Group [34] and its updated version MACNPC2 [35] as these two IPD metaanalyses differed in terms of evidence of treatment effect heterogeneity. These IPD metaanalyses studied the addition of chemotherapy (CT) to radiotherapy (RT) in patients with nasopharynx carcinoma. For the estimation of the \( rmstD\left({t}^{\ast}\right) \), we selected \( {t}^{\ast } \)= 5 years and \( {t}^{\ast } \)= 10 years, as these were the two time points of clinical interest in the publications of MACNPC and MACNPC2.
MetaAnalysis of Chemotherapy in Nasopharynx Carcinoma (MACNPC)
The data from the MACNPC [34] included 1,975 patients in 11 treatment comparisons. The pooled HR estimated with a fixed effect model was 0.82 (95 % CI: [0.71;0.94]), indicating a significant improvement in overall survival with RT plus CT (p = 0.006). The treatment effect heterogeneity was significant (Qtest: p = 0.03; Higgins’ I^{2} = 50 %) which was explained by the timing of CT. The pooled HR estimated with a DerSimonianLaird random effects model [28] was 0.82 (95 % CI: [0.66;1.02], p = 0.08).
The overall proportional hazards assumption was verified at the 5 % significance level (p = 0.09) according to the methodology described by Wei et al. [6], in which trialspecific pvalues from GrambschTherneau test [36] are pooled. The \( rmstD\left({t}^{\ast}\right) \) ranged from 0.17 to 0.23 years at \( {t}^{\ast } \)= 5 years and from 0.46 to 0.55 years at \( {t}^{\ast } \)= 10 years across the estimation methods (Table 3). For Pooled KaplanMeier and Pooled Exponential using a random effects model, the \( rmstD\left({t}^{\ast}\right) \) was not significantly different from 0. As there was high treatment effect heterogeneity in the MACNPC, a DerSimonianLaird random effects model was deemed more appropriate to aggregate the trialspecific \( {rmstD}_j\left({t}^{\ast}\right) \). As previously seen in the simulation study, a fixed effect model would underestimate the variance of the overall estimate. Also, similarly to our simulation study with proportional hazards, larger values for \( rmstD\left({t}^{\ast}\right) \) and SE (\( rmstD\left({t}^{\ast}\right) \)) were found at \( {t}^{\ast } \)= 5 years as compared to at \( {t}^{\ast } \)= 10 years. Figure 3 displays the forest plot for trialspecific \( {rmstD}_j\left({t}^{\ast}\right) \) and overall \( rmstD\left({t}^{\ast}\right) \) estimated using Pooled KaplanMeier with DerSimonianLaird random effects at \( {t}^{\ast } \)= 10 for the MACNPC metaanalysis. Figure 4 displays the overall rmstD(t ^{*}) estimated by Pooled KaplanMeier with DerSimonianLaird random effects when varying \( {t}^{\ast } \); it shows that the \( rmstD\left({t}^{\ast}\right) \) is not significantly different from 0 for \( {t}^{\ast } \)ϵ [0;10] years. The same graphic for the overall \( rmstD\left({t}^{\ast}\right) \) is displayed as Additional file 1: Figure S1.
Update of MetaAnalysis of Chemotherapy in Nasopharynx Carcinoma (MACNPC2)
The MACNPC2 [35], the update of the MACNPC, included new trials as well as updated followup for trials included in the MACNPC (N = 5,028 patients within 23 comparisons). For overall survival, a significant pooled HR of 0.79 (95 % CI: [0.73;0.86], p < 0.001) in favor of CT + RT was obtained with a fixed effect model. In the MACNPC2, there was less evidence of treatment effect heterogeneity (Qtest: p = 0.09; Higgins’ I^{2} = 30 %) than in the MACNPC. The pooled HR with a DerSimonianLaird random effects model [28] was 0.79 (95 % CI: [0.70;0.87], p <0.001).
The pooled pvalue test (p = 0.16) suggested that the overall proportional hazards assumption was appropriate. The \( rmstD\left({t}^{\ast}\right) \) ranged from 0.17 to 0.21 years at \( {t}^{\ast } \)= 5 years and from 0.53 to 0.59 years at \( {t}^{\ast } \)= 10 across the estimation methods (Table 3). The \( rmstD\left({t}^{\ast}\right) \) was significantly different from 0 and in favor of the RT + CT arm with all of the methods. As compared to the results in the MACNPC, the standard error of the \( rmstD\left({t}^{\ast}\right) \) was lower in the MACNPC2 with a \( rmstD\left({t}^{\ast}\right) \) of similar magnitude for all the methods. This was consistent with the simulation results, as there were more trials and overall more patients included in the MACNPC2. The forest plot for the MACNPC2 displaying trialspecific \( {rmstD}_j\left({t}^{\ast}\right) \) and the overall \( rmstD\left({t}^{\ast}\right) \) at \( {t}^{\ast } \)= 10 years estimated using the Pooled KaplanMeier method with DerSimonianLaird random effects is provided in Additional file 1: Figure S2.
Discussion
The difference in restricted mean survival time (\( rmstD\left({t}^{\ast}\right) \)) is an appealing alternative to the hazard ratio (HR) as measure of treatment effect, because it does not require the proportional hazards assumption and is considered to have a more intuitive interpretation [3, 5, 6]. Furthermore, the \( rmstD\left({t}^{\ast}\right) \) is directly related to costeffectiveness analysis as it is the denominator of the incremental costeffectiveness ratio, so one can use the \( rmstD\left({t}^{\ast}\right) \) estimation from a previous publication to perform a costeffectiveness analysis. We previously showed that in a costeffectiveness analysis even small variations in the estimate of the \( rmstD\left({t}^{\ast}\right) \) from an individual patient data (IPD) metaanalysis can yield significantly different reimbursement conclusions [37]. However, to our knowledge only one evaluation of the methods to estimate the \( rmstD\left({t}^{\ast}\right) \) from IPD metaanalysis is available to date [6].
In this study, we compared different methods to estimate the \( rmstD\left({t}^{\ast}\right) \) from IPD metaanalysis in different scenarios varying several key metaanalysis parameters. We showed that Pooled KaplanMeier was rarely biased. Similarly, Naïve KaplanMeier was unbiased in all scenarios, whereas Pooled Exponential showed a bias with nonproportional hazards at \( {t}^{\ast } \)= 10 years and an even larger bias at t ^{*} = 5 years. Petoquintile underestimated the \( rmstD\left({t}^{\ast}\right) \), except with nonproportional hazards at \( {t}^{\ast } \)= 5 years. In case of treatment effect heterogeneity, the use of a fixed effect model was not appropriate and all methods except Pooled KaplanMeier and Pooled Exponential with DerSimonianLaird random effects underestimated the standard error of the \( rmstD\left({t}^{\ast}\right) \). Overall, the Pooled KaplanMeier method with DerSimonianLaird random effects formed the best compromise in terms of bias and variance for estimating the \( rmstD\left({t}^{\ast}\right) \) from IPD metaanalysis.
In the IPD metaanalyses studying the effect of chemotherapy (CT) plus radiotherapy (RT) versus RT alone in nasopharynx carcinoma, the \( rmstD\left({t}^{\ast },=,10,\kern0.5em ,\mathrm{years}\right) \) estimated using the Pooled KaplanMeier method with DerSimonianLaird random effects was 0.49 years (95 % CI: [−0.06;1.03], p = 0.08) in the MACNPC [34] and 0.59 years (95 % CI: [0.34;0.84], p < 0.0001) in its updated version MACNPC2 [35]. In other words, the addition of CT to RT extended the 10year mean survival time by 7.1 months (95 % CI 4.1;10.1) in MACNPC2. We believe the clinical interpretation with the \( rmstD\left({t}^{\ast}\right) \) is more intuitive than the one derived from the pooled hazard ratio with DerSimonianLaird random effects of 0.79 (95 % CI 0.70−0.87) in MACNPC2.
The \( rmstD\left({t}^{\ast}\right) \) is an absolute outcome measure which depends both on the baseline hazard and on the relative treatment effect. Consequently, the heterogeneity test when pooling the \( {rmstD}_j\left({t}^{\ast}\right) \) reflects both baseline hazard and relative treatment effect heterogeneities. Deeks already showed that in 551 systematic reviews with binary outcomes the heterogeneity was higher for an absolute outcome than for a relative outcome [38]. In our IPD metaanalyses in nasopharynx carcinoma, the heterogeneity was slightly higher when pooling the \( {rmstD}_j\left({t}^{\ast}\right) \) with Pooled KaplanMeier than when pooling the hazard ratios: for the MACNPC there was a small increase in the heterogeneity with Cochran Q test pvalue = 0.03, I^{2} = 50 % for HR as compared to p = 0.02, I^{2} = 54 % for \( rmstD\left({t}^{\ast },=,10,\kern0.5em ,\mathrm{years}\right) \) (Fig. 3). For the MACNPC2, this increase was more pronounced with p = 0.09, I^{2} = 30 % for HR versus p = 0.01, I^{2} = 45 % for \( rmstD\left({t}^{\ast },=,10,\kern0.5em ,\mathrm{years}\right) \) (Additional file 1: Figure S1). Wei and colleagues showed a similar trend in their second example (p = 0.47, I^{2} = 0 % for HR and p = 0.20, I^{2} = 24 % for \( rmstD\left({t}^{\ast },=,5,\kern0.5em ,\mathrm{years}\right) \)) [6].
In our simulation study, we have induced betweentrial heterogeneity for the baseline hazard and for the treatment effect using two random effects. As a matter of fact, both random effects can be tested, by testing Var(A _{ j }) = σ ^{2} = 0 and Var(B _{ j }) = τ ^{2} = 0. Testing for the presence of treatment effect heterogeneity (τ ^{2} = 0) corresponds to the Cochran Qtest which we have used in the MACNPC applications [34, 35]. Commenges and Andersen [39] and Biard and colleagues [40] proposed respectively the use of score tests or permutation tests for testing the baseline heterogeneity (σ ^{2} = 0) in proportional hazard models. Rondeau et al. [19] tested both the baseline hazard (σ ^{2} = 0) and the treatment effect heterogeneity between trials (τ ^{2} = 0) using a mixture of χ ^{2} distributions in onestage Cox models.
Recent techniques like the one proposed by Guyot et al. [41] allow one to reconstruct IPD based on published KaplanMeier curves, which could be useful to recalculate the \( rmstD\left({t}^{\ast}\right) \) even for aggregate data. However, we suggest that clinical publications for single (multicenter) clinical trial or IPD metaanalysis should report the \( rmstD\left({t}^{\ast}\right) \) at different time horizons \( {t}^{\ast } \) of clinical interest in addition to the hazard ratio. This way the \( rmstD\left({t}^{\ast}\right) \) would be available for future economic evaluations. This is of particular relevance as a previous study stated that the survival outcome in a costeffectiveness analysis based on a clinical trial or a metaanalysis should be estimated with the same statistical model used for efficacy [42].
Among the twostage methods studied by Wei et al., the nonparametric pseudovalues method was disregarded, as Wei et al. showed that it led to similar results as the nonparametric Pooled KaplanMeier method [6]. Also, among parametric models we chose the exponential model instead of the Royston and Parmar flexible parametric model for ease of computation. In addition, we chose to study other nonparametric methods from the medical literature that have been actually applied in practice. Parametric methods developed for network metaanalysis were not included [43, 44]. Furthermore, methods using the percentile ratio [45, 46] were beyond the scope of this study, which focused on the \( rmstD\left({t}^{\ast}\right) \). In addition, in this simulation study, we only considered balanced trials and we did not vary the administrative censoring rate.
The \( rmstD\left({t}^{\ast}\right) \) is inherently dependent on the choice of \( {t}^{\ast } \). Also, we showed that its standard error gets larger as \( {t}^{\ast } \)increases (Fig. 1). Karrison recommended to choose a maximum time horizon \( {t}^{\ast } \) such that SE(S(\( {t}^{\ast } \))) is less than a chosen ceiling value [29, 47]. In the particular case of an IPD metaanalysis, trials can have different lengths of followup, and there is thus a compromise to achieve between small values of \( {t}^{\ast } \) that censor a lot of data with a high loss of information, and high values of \( {t}^{\ast } \) that need a massive use of extrapolation. Wei and colleagues stated that the choice of \( {t}^{\ast } \) should also be of clinical interest, and they suggested plotting the \( rmstD\left({t}^{\ast}\right) \) against \( {t}^{\ast } \) to see how the treatment effect varies over time. In MACNPC for instance such a plot shows that the \( rmstD\left({t}^{\ast}\right) \) was not significantly different from 0 with \( {t}^{\ast } \)ϵ [0;10] years based on pointwise confidence intervals (Fig. 4). In two recent papers, Tian et al. [48] and Zhao et al. [4] have proposed a simultaneous confidence interval of the \( rmstD\left({t}^{\ast}\right) \) in the context of one randomized controlled trial. However, an extension to the context of IPD metaanalyses or multicenter clinical trials has not yet been proposed and may be the subject of further research.
Depending on the choice of the time horizon \( {t}^{\ast } \), some trials included in the IPD metaanalysis may have a followup not long enough to reach \( {t}^{\ast } \). In our study, for such trials, we used the extrapolation method proposed by Brown et al. [32] until \( {t}^{\ast } \) for the Naïve KaplanMeier and the Pooled KaplanMeier methods. Lamb and colleagues [10] have shown that this extrapolation method is less biased than the mean survival time restricted at the last observed event time. For lifetime extrapolation, which can be needed in costeffectiveness analysis, one can estimate the difference in mean survival time using the Pooled KaplanMeier with a DerSimonianLaird random effects model. In each trial, the difference in mean survival time would be estimated using KaplanMeier curves with extrapolated parametric tails [9, 10]. Similarly, for the two other nonparametric methods Naïve KaplanMeier and Petoquintile, one can extrapolate the survival curves beyond the last observed failure time by using an extrapolated parametric tail.
Conclusions
The difference in restricted mean survival time (\( rmstD\left({t}^{\ast}\right) \)) is an appealing alternative to the hazard ratio to measure the treatment effect in a metaanalysis of timetoevent outcomes, as it is free of the proportional hazards assumption and its interpretation is more intuitive. We compared methods to estimate the \( rmstD\left({t}^{\ast}\right) \) from an individual patient data metaanalysis. In our simulation study, in which a large panel of metaanalysis parameters was varied, the twostage Pooled KaplanMeier method with DerSimonianLaird random effects formed the best compromise in terms of bias and variance. Thus, Pooled KaplanMeier with DerSimonianLaird random effects should be the preferred method to estimate the difference in restricted mean survival time from an individual patient data metaanalysis or from a multicenter clinical trial.
Availability of data and materials
Data were used with permission obtained from the MACNPC Collaborative Group investigators, who agreed to share their data with us by signing an amendment to the original protocol. The French data protection authority (CNIL  Commission Nationale de l’Informatique et des Libertés) does not allow us to make these data publicly available.
Abbreviations
 ASE:

average standard error
 CI:

confidence interval
 CT:

chemotherapy
 ESE:

empirical standard error
 HR:

hazard ratio
 IPD:

individual patient data
 MACNPC:

MetaAnalysis of Chemotherapy in Nasopharynx Carcinoma
 RMST:

restricted mean survival time
 rmstD:

difference in restricted mean survival time
 RT:

radiotherapy
References
 1.
Irwin JO. The standard error of an estimate of expectation of life, with special reference to the expectation of tumour less life in experiments with mice. J Hygiene. 1949;47:188–9.
 2.
Andersen PK, Hansen MG, Klein JP. Regression analysis of restricted mean survival time based on pseudoobservations. Lifetime Data Anal. 2004;10:335–50.
 3.
Royston P, Parmar MK. Restricted mean survival time: an alternative to the hazard ratio for the design and analysis of randomized trials with a timetoevent outcome. BMC Med Res Methodol. 2013;13:152.
 4.
Zhao L, Claggett B, Tian L, Uno H, Pfeffer MA, Solomon SD, et al. On the restricted mean survival time curve in survival analysis. Biometrics. 2015. doi:10.1111/biom.12384.
 5.
Royston P, Parmar MK. The use of restricted mean survival time to estimate the treatment effect in randomized clinical trials when the proportional hazards assumption is in doubt. Stat Med. 2011;30:2409–21.
 6.
Wei Y, Royston P, Tierney JF, Parmar MK. Metaanalysis of timetoevent outcomes from randomized trials using restricted mean survival time: application to individual participant data. Stat Med. 2015;34:2881–98.
 7.
Jackson CH, Sharples LD, Thompson SG. Survival models in health economic evaluations: balancing fit and parsimony to improve prediction. Int J Biostat. 2010;6:34.
 8.
Latimer N. Survival analysis for economic evaluations alongside clinical trialsextrapolation with patientlevel data: inconsistencies, limitations, and a practical guide. Med Decis Making. 2013;33:743–54.
 9.
Gong Q, Fang L. Asymptotic properties of mean survival estimate based on the KaplanMeier curve with an extrapolated tail. Pharm Stat. 2012;11:135–40.
 10.
Lamb KE, Williamson EJ, Coory M, Carlin JB. Bias and precision of measures of survival gain from rightcensored data. Pharm Stat. 2015;14:409–17.
 11.
Altman DG, Deeks JJ. Metaanalysis, Simpson’s paradox, and the number needed to treat. BMC Med Res Methodol. 2002;2:3.
 12.
Cates CJ. Simpson’s paradox and calculation of number needed to treat from metaanalysis. BMC Med Res Methodol. 2002;2:1.
 13.
Rücker G, Schumacher M. Simpson’s paradox visualized: the example of the rosiglitazone metaanalysis. BMC Med Res Methodol. 2008;8:34.
 14.
Glidden DV, Vittinghoff E. Modelling clustered survival data from multicentre clinical trials. Stat Med. 2004;23:369–88.
 15.
Legrand C, Ducrocq V, Janssen P, Sylvester R, Duchateau L. A Bayesian approach to jointly estimate centre and treatment by centre heterogeneity in a proportional hazards model. Stat Med. 2005;24:3789–804.
 16.
Munda M, Legrand C. Adjusting for centre heterogeneity in multicentre clinical trials with a timetoevent outcome. Pharm Stat. 2014;13:145–52.
 17.
Michiels S, Baujat B, Mahé C, Sargent DJ, Pignon JP. Random effects survival models gave a better understanding of heterogeneity in individual patient data metaanalyses. J Clin Epidemiol. 2005;58:238–45.
 18.
Smith CT, Williamson PR, Marson AG. Investigating heterogeneity in an individual patient data metaanalysis of time to event outcomes. Stat Med. 2005;24:1307–19.
 19.
Rondeau V, Michiels S, Liquet B, Pignon JP. Investigating trial and treatment heterogeneity in an individual patient data metaanalysis of survival data by means of the penalized maximum likelihood approach. Stat Med. 2008;27:1894–910.
 20.
Stewart GB, Altman DG, Askie LM, Duley L, Simmonds MC, Stewart L. Statistical analysis of individual participant data metaanalyses: a comparison of methods and recommendations for practice. PLoS One. 2012;7:e46042.
 21.
Bowden J, Tierney JF, Simmonds M, Copas AJ, Higgins JP. Individual patient data metaanalysis of timetoevent outcomes: onestage versus twostage approaches for estimating the hazard ratio under a random effects model. Res Synth Methods. 2011;2:150–62.
 22.
Smith CT, Williamson PR. A comparison of methods for fixed effects metaanalysis of individual patient data with time to event outcomes. Clin Trials. 2007;4:621–30.
 23.
Andersen PK, Perme MP. Pseudoobservations in survival analysis. Stat Methods Med Res. 2010;19:71–99.
 24.
Royston P, Parmar MK. Flexible parametric proportionalhazards and proportionalodds models for censored survival data, with application to prognostic modelling and estimation of treatment effects. Stat Med. 2002;21:2175–97.
 25.
Early Breast Cancer Trialists’ Collaborative Group. Treatment of early breast cancer vol.1: worldwide evidence 1985–1990. Oxford: Oxford University Press; 1990.
 26.
Pignon JP, Le Maître A, Maillard E, Bourhis J, on behalf of the MACHNC Collaborative Group. Metaanalysis of chemotherapy in head & neck cancer (MACHNC): an update on 93 randomized trials and 17 346 patients. Radiother Oncol. 2009;92:4–14.
 27.
Early Breast Cancer Trialists’ Collaborative Group, Peto R, Davies C, et al. Comparisons between different polychemotherapy regimens for early breast cancer: metaanalyses of longterm outcome among 100,000 women in 123 randomised trials. Lancet. 2012;379:432–44.
 28.
DerSimonian R, Laird N. Metaanalysis in clinical trials. Control Clin Trials. 1986;7:177–88.
 29.
Karrison T. Use of Irwin’s restricted mean as an index for comparing survival in different treatment groups—interpretation and power considerations. Control Clin Trials. 1997;18:151–67.
 30.
DurandZaleski I, Roche B, Buyse M, Carlson R, O’Connell MJ, Rougier P, et al. Economic implications of hepatic arterial infusion chemotherapy in treatment of nonresectable colorectal liver metastases. J Natl Cancer Inst. 1997;32:790–5.
 31.
Rotolo F, Michiels S. Testing the treatment effect on competing causes of death in oncology clinical trials. BMC Med Res Methodol. 2014;14:72. doi:10.1186/147122881472.
 32.
Brown J, Hollander M, Korwar R. Nonparametric tests of independence for censored data with application to heart transplant studies. Reliability and Biometry, Statistical Analysis of Lifelength 1974;327–54
 33.
Burton A, Altman DG, Royston P, Holderal RL. The design of simulation studies in medical statistics. Stat Med. 2006;25:4279–92.
 34.
Baujat B, Audry H, Bourhis J, Chan A, Onat H, Chua D, et al. Chemotherapy in locally advanced nasopharyngeal carcinoma: an individual patient data metaanalysis of eight randomized trials and 1753 patients. Int J Radiat Oncol Biol Phys. 2006;64:47–56.
 35.
Blanchard P, Lee A, Marguet S, Leclercq J, Ng WT, Ma J, et al. Chemotherapy and radiotherapy in nasopharyngeal carcinoma: an update of the MACNPC metaanalysis. Lancet Oncol. 2015;16:645–55.
 36.
Grambsch P, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika. 1994;81:515–26.
 37.
Lueza B, Mauguen A, Pignon JP, RiveroArias O, Bonastre J. Difference in restricted mean survival time for costeffectiveness analysis using individual patient data metaanalysis: evidence from a case study. PLoS One. 2016. doi:10.1371/journal.pone.0150032.
 38.
Deeks JJ. Issues in the selection of a summary statistic for metaanalysis of clinical trials with binary outcomes. Stat Med. 2002;21:1575–600.
 39.
Commenges D, Andersen PK. Score test of homogeneity for survival data. Lifetime Data Anal. 1995;1:145–56.
 40.
Biard L, Porcher R, RescheRigon M. Permutation tests for centre effect on survival endpoints with application in an acute myeloid leukaemia multicentre study. Stat Med. 2014;33:3047–57.
 41.
Guyot P, Ades A, Ouwens MJ, Welton NJ. Enhanced secondary analysis of survival data: reconstructing the data from published KaplanMeier survival curves. BMC Med Res Methodol. 2012;12:9.
 42.
Guyot P, Welton NJ, Ouwens MJNM, Adesa E. Survival time outcomes in randomized, controlled trials and metaanalyses: the parallel universes of efficacy and costeffectiveness. Value Health. 2011;14:640–6.
 43.
Jansen JP. Network metaanalysis of survival data with fractional polynomials. BMC Med Res Methodol. 2011;11:61.
 44.
Ouwens MJNM, Philips Z, Jansen JP. Network metaanalysis of parametric survival curves. Res Synth Methods. 2010;1:258–71.
 45.
Siannis F, Barrett JK, Farewell VT, Tierney JF. Onestage parametric metaanalysis of timetoevent outcomes. Stat Med. 2010;29:3030–45.
 46.
Barrett JK, Farewell VT, Siannis F, Tierney J, Higgins JPT. Twostage metaanalysis of survival data from individual participants using percentile ratios. Stat Med. 2012;31:4296–308.
 47.
Karrison T. Restricted mean life with adjustment for covariates. J Am Stat Assoc. 1987;82:1169–76.
 48.
Tian L, Zhao L, Wei LJ. Predicting the restricted mean event time with the subject’s baseline covariates in survival analysis. Biostatistics. 2014;15:222–33.
Acknowledgements
We thank the members of the MACNPC Collaborative Group [see [34] for the list of the members] who agreed to share their data. We are grateful to Oliver RiveroArias for scientific discussion and support.
Funding
This work was supported by ITMO Cancer and IReSP (French Public Health Research Institute) as part of the French “Plan Cancer 2009–2013”, by the French “Programme Hospitalier de Recherche Clinique”, and by the French “Ligue Nationale Contre le Cancer”. The funding sources had no role in study design, data collection, data analysis, data interpretation, or manuscript writing.
Author information
Additional information
Competing interest
The authors declare that they have no competing interests.
Authors’ contributions
BL and FR performed the statistical analyses and drafted the manuscript. BL, FR, JB, JPP, and SM jointly contributed to writing the study protocol, interpreting and discussing results, and to writing the article. BL, FR, JB, JPP, and SM read and approved the final manuscript.
An erratum to this article can be found at http://0dx.doi.org.brum.beds.ac.uk/10.1186/s128740160154y.
Additional file
Additional file 1:
Supplementary statistical details, Tables S1S7 and Figures S1S2. (DOCX 339 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Lueza, B., Rotolo, F., Bonastre, J. et al. Bias and precision of methods for estimating the difference in restricted mean survival time from an individual patient data metaanalysis. BMC Med Res Methodol 16, 37 (2016) doi:10.1186/s128740160137z
Received
Accepted
Published
DOI
Keywords
 Restricted mean survival time
 Survival benefit
 Metaanalysis
 Multicenter clinical trial
 Survival analysis
 Simulation study