 Research
 Open Access
 Published:
Independence estimators for rerandomisation trials in multiepisode settings: a simulation study
BMC Medical Research Methodology volume 21, Article number: 235 (2021)
Abstract
Background
Rerandomisation trials involve reenrolling and rerandomising patients for each new treatment episode they experience. They are often used when interest lies in the average effect of an intervention across all the episodes for which it would be used in practice. Rerandomisation trials are often analysed using independence estimators, where a working independence correlation structure is used. However, research into independence estimators in the context of rerandomisation has been limited.
Methods
We performed a simulation study to evaluate the use of independence estimators in rerandomisation trials. We focussed on a continuous outcome, and the setting where treatment allocation does not affect occurrence of subsequent episodes. We evaluated different treatment effect mechanisms (e.g. by allowing the treatment effect to vary across episodes, or to become less effective on reuse, etc), and different nonenrolment mechanisms (e.g. where patients who experience a poor outcome are less likely to reenrol for their second episode). We evaluated four different independence estimators, each corresponding to a different estimand (perepisode and perpatient approaches, and addedbenefit and policybenefit approaches).
Results
We found that independence estimators were unbiased for the perepisode addedbenefit estimand in all scenarios we considered. We found independence estimators targeting other estimands (perpatient or policybenefit) were unbiased, except when there was differential nonenrolment between treatment groups (i.e. when different types of patients from each treatment group decide to reenrol for subsequent episodes). We found the use of robust standard errors provided close to nominal coverage in all settings where the estimator was unbiased.
Conclusions
Careful choice of estimand can ensure rerandomisation trials are addressing clinically relevant questions. Independence estimators are a useful approach, and should be considered as the default estimator until the statistical properties of alternative estimators are thoroughly evaluated.
Introduction
Rerandomisation trials can be used to evaluate interventions in multiepisode settings, where some patients require treatment on more than one occasion [1,2,3,4,5,6]. In rerandomisation trials, patients are reenrolled and rerandomised for each new treatment episode they experience (providing they continue to remain eligible). The number of times each patient is enrolled is not specified by the design, but instead is based on the number of treatment episodes they experience during the trial [4]. The two key design requirements for rerandomisation trials are that (i) patients are only reenrolled when the followup period from their previous enrolment is complete; and (ii) randomisations for the same patient are independent [4].
The use of rerandomisation can increase efficiency, as a larger number of available treatment episodes are enrolled [3,4,5], and can also help to address questions about the average effect of the intervention across all episodes for which it would be used in practice [1]. Rerandomisation trials have been used to evaluate interventions for sickle cell pain crises (where patients are rerandomised for each new pain crisis) [7], severe asthma exacerbations (where patients are rerandomised for each new exacerbation) [8], influenza vaccines (where patients are rerandomised for each new influenza seasons) [9], invitro fertilisation (where participants are rerandomised for each new cycle) [10], and preterm birth (where participants are rerandomised for each new pregnancy) [11].
Independence estimators have been proposed for rerandomisation trials, where a working independence correlation structure is used [1, 4], and this approach is commonly used in practice [5]. However, prior methodological work around these estimators is limited. First, previous work has primarily focussed on the setting where the treatment effect is the same across all patients and episodes [2, 4, 6]. However, this may not always be the case in practice; for instance, the intervention may become more or less effective the more often it is used, or patients with more severe underlying conditions may be both predisposed to experience a larger number of episodes, and always have worse responses to treatment.
Second, most research has been in the setting where there is no differential nonenrolment (i.e. in the setting where the same type of patients from each treatment group reenrols for subsequent episodes). However, this may not always be the case in practice; for instance, in openlabel trials where patients are aware of their treatment allocation, their probability of reenrolling may be affected by the combination of prior treatment allocation as well as their response to treatment.
Third, the use of robust standard errors which allow for clustering have been proposed [1], however their use has never been empirically evaluated for rerandomisation trials.
Finally, most previous literature has focussed on an estimator which targets a perepisode addedbenefit estimand, which represents the average treatment effect across episodes, over and above any benefit from previous assignment to the intervention [1]. This evaluation is warranted, as this is the estimator which is most commonly used in practice [5]; however, other estimands have been proposed [1], and it would be of interest to evaluate the use of independence estimators for these alternate estimands.
The purpose of this paper is to comprehensively evaluate the use of independence estimators in rerandomisation trials using a large simulation study. In particular, we will evaluate their use (i) for different estimands; (ii) under nonconstant treatment effect mechanisms; (iii) under differential nonenrolment; and (iv) in conjunction with robust standard errors. For simplicity, we focus on a setting where patients experience a maximum of two episodes, and where the outcome of interest is continuous. We also focus on the setting where the interventions under study do not affect whether future episodes occur (i.e. patients would experience the same number of episodes during the trial period regardless of which treatments they are allocated to), though we note rerandomisation trials can also be used when this is not the case.
Methods
Notation
We briefly summarise some of the notation that will be used in this article. Some further notation is introduced in later sections as required.
Let i index patient, and j index the episode number within the trial. Then, let Y_{ij} denote the outcome for patient i at episode j, and Z_{ij} denote their treatment allocation (0 = control, 1 = intervention). \({\overset{\sim }{Z}}_{ij}\) is the patient’s ‘treatment history’, which denoates their treatment allocations in their previous episodes (e.g. \({\overset{\sim }{Z}}_{13}\) would be the vector (Z_{11}, Z_{12})). Then, let \(Y_{ij}^{\left(Z_{ij}=0,\widetilde{Z_{ij}}=\widetilde{Z_{ij}}\right)}\) represent the potential outcome under Z = 0 and treatment history \(\overset{\sim }{Z_{ij}}=\overset{\sim }{z_{ij}}\). For clarity, we drop subscripts inside the brackets, as these are the same as subscripts on the outside of the brackets; for example, \(Y_{ij}^{\left(Z=1,\widetilde Z=\widetilde z\right)}\) is the same as \(Y_{ij}^{\left(Z_{ij}=1,\widetilde{Z_{ij}}=\widetilde{Z_{ij}}\right)}\) .
Finally, let M_{i} be the number of episodes for which patient i is enrolled in the trial, M_{T} be the total number of episodes enrolled, and M_{T(j)} be the total number of patients for whom M_{i} = j (i.e. the number of patients enrolled for j episodes). There are N_{T} total patients enrolled in the trial, and N_{j} represents the number of patients who are enrolled for at least j episodes (i.e. for whom M_{i} ≥ j).
Simulation study
We conducted a large simulation study to evaluate the bias and coverage of independence estimators for rerandomisation trials [12]. This simulation study focussed on a setting where patients were enrolled in a trial for a maximum of two episodes. For most scenarios, we chose parameter values that are larger than those we might expect to see in practice; this was to ensure that if estimators were biased in any scenarios, we would be able to identify it.
We performed three different sets of simulations (labelled simulation study 1, 2a, and 2b). We describe the estimands, methods of analysis, and performance measures used across all three simulation studies below, and then describe the data generating models for each of the three simulation studies.
All simulations were conducted using Stata v15.1.
Estimands
For all simulation scenarios, we used the following four estimands, which have been described previously [1]: (a) perepisode addedbenefit; (b) perpatient addedbenefit; (c) perepisode policybenefit; and (d) perpatient policybenefit.
Descriptions of these estimands are available in Table 1, and are described briefly in the sections below; full details are available in reference [1]. We calculated the true value of each estimand for each simulation scenario using the method described in reference [1]; for simulation scenario 1, we calculated these values analytically, and for simulation scenarios 2a and 2b, we calculated these by simulating a single large dataset (this process is described further below); the reason we used simulation for scenarios 2a and 2b was because they involved nonenrolment (i.e. some patients did not reenrol for their 2nd episode), which made the analytical calculations more challenging.
Perepisode addedbenefit estimand
The perepisode addedbenefit estimand is defined as:
where (IJ)^{E} represents a randomly selected episode from the trial (each with equal probability), and so \({Y}_{(IJ)^E}\) represents the outcome for a randomly selected episode.
In this estimand, each episode is given equal weight. It addresses the questions: what is the average treatment effect across episodes, over and above any benefit of the intervention from previous episodes? Broadly, it measures the benefit of the intervention conditional on a shared treatment history (i.e. it measures the difference in potential outcomes under intervention vs. control, where both potential outcomes share a common treatment history), and then takes a weighted average of this effect across the different treatment histories. Importantly, the treatment effect here measures the benefit conferred from treatment in the current episode, over and above any benefit carried forward from being assigned intervention in previous episodes.
Perpatient addedbenefit estimand
The perpatient addedbenefit estimand is defined as:
where (IJ)^{P} represents a randomly selected episode from a randomly selected patient (i.e. first a patient is randomly selected from the trial, each with equal probability; and then an episode from within that patient is selected, each with equal probability). Then \({Y}_{(IJ)^P}\) represents the outcome for randomly selected episode from a randomly selected patient.
In this estimand, each patient is given equal weight. It addresses the question: what is the average treatment effect across patients, over and above any benefit of the intervention from previous episodes? Broadly, it measures an average of the patientspecific treatment effects. As above, the treatment effect here measures the benefit conferred from treatment in the current episode, over and above any benefit carried forward from being assigned intervention in previous episodes.
Perepisode policybenefit estimand
The perepisode policybenefit estimand is defined as:
where \(\overset{\sim }{Z}=\overset{\sim }{1}\) denotes the patient has been assigned to intervention in all previous episodes (and vice versa for \(\overset{\sim }{Z}=\overset{\sim }{0}\)).
In this estimand, each episode is given equal weight. It addresses the question: what is the average treatment effect across episodes, where patients are assigned intervention for all episodes vs. control for all episodes? Broadly, it measures the difference in potential outcomes under intervention for this and all previous episodes vs. control for this and all previous episodes.
Perpatient policybenefit estimand
The perpatient policybenefit estimand is defined as:
In this estimand, each patient is given equal weight. It addresses the question: what is the average treatment effect across patients, where patients are assigned intervention for all episodes vs. control for all episodes? Broadly, it measures the difference in potential outcomes under intervention for this and all previous episodes vs. control for this and all previous episodes.
Methods of analysis
We implemented four independence estimators, each corresponding to one of the four the estimands listed above. Briefly, we used a working independence correlation structure in conjunction with clusterrobust standard errors, with patients acting as the cluster [13]. Although working correlation structures are typically used in conjunction with generalised estimating equations, here we use them with linear regression models which implicitly assume an independence correlation structure. The main benefit to using linear regression models here is that inference can be based on the tdistribution, which is not the case with generalised estimating equations.
These estimators are fully described in the sections below, and a summary is provided in Table 1. A full overview of these estimators is provided in reference (1), and details of how these estimators were implemented in Stata is shown in Table 2.
Perepisode addedbenefit estimator
We used the following estimator, which corresponds to a simple difference in means between all intervention episodes vs. all control episodes:
This estimator can be rewritten to show that it compares intervention vs. control for episodes with a common treatment history; because this broadly matches the estimand (with the key difference being the estimand is based on a comparison of potential outcomes from the same patient, whereas the estimator is based on a comparison of randomised groups), we expect this estimator to be unbiased.
Perpatient addedbenefit estimator
The perpatient estimator can be obtained by weighting each patient by the inverse of their number of episodes, i.e. \({W}_i=\frac{1}{M_i}\):
Perepisode policybenefit estimator
The policybenefit estimators can be obtained using a twostep approach. In the first step, a causal model is specified for the effect of treatment history on the potential outcomes; then in the second step, estimates of the difference in potential outcomes obtained in the first step are combined into an overall estimate. We note that the exact models used to calculate this effect can vary (particularly depending on the number of episodes each patient experiences, e.g. more complex models would be required for trials in which some patients experience >2 episodes). Here, we described the estimator we used in the setting where patients experience a maximum of two episodes.
In the first step, we used the following model:
where Z_{i, j − 1} is the treatment allocation in the previous episode (and is set to 0 for j = 1), and \({X}_{e{p}_{ij}}\) is an indicator for episode 2 (i.e. \({X}_{e{p}_{ij}}=1\) for episode 2, and 0 otherwise). This model allows the effect of the intervention in episode 1 to carry forward into episode 2 (the term γ), and for the intervention to get more (or less) effective the 2nd time it is used (the term δ).
Then, in the second step, we use the estimates \(\hat{\upbeta}\), \(\hat{\upgamma}\), and \(\hat{\updelta}\) to calculate an overall estimate of the policybenefit effect. Here, we estimate the difference in potential outcomes as \(\hat{\upbeta}\) for all first episodes, and as \(\hat{\upbeta}+\hat{\upgamma}+\hat{\updelta}\) for all second episodes. The formula for the overall estimate is:
This formula weights \(\hat{\upbeta}\) and \(\hat{\upbeta}+\hat{\upgamma}+\hat{\updelta}\) by the proportion of episodes to which they correspond. We have included the term \({X}_{e{p}_{ij}}\) in model [3] even though it is not used directly to estimate the treatment effect, as it is associated with Z_{i, j − 1}, and so can act as a confounder if omitted from the model.
Perpatient policybenefit estimator
The perpatient policybenefit estimator is obtained in the same way as the perepisode policybenefit estimator, except estimates from model [3] are obtained using weighted leastsquares, where each patient is weighted by the inverse of their number of episodes, \({W}_i=\frac{1}{M_i}\). After obtaining estimates and calculating the difference in potential outcomes for each episode in the same manner as above, the overall treatment effect is calculated as:
where M_{T(j)} represents the total number of patients for whom M_{i} = j. In this equation, \(\frac{M_{T(1)}}{N_T}\left(\hat{\upbeta}\right)\) is the component for patients where M_{i} = 1, and \(\frac{M_{T(2)}}{N_T}\left(\frac{1}{2}\hat{\upbeta}+\left(\frac{1}{2}\right)\left(\hat{\upbeta}+\hat{\upgamma}+\hat{\updelta}\right)\right)\) is the component for patients where M_{i} = 2 (with \(\frac{1}{2}\hat{\upbeta}\) being the 1st episode component, and \(\left(\frac{1}{2}\right)\left(\hat{\upbeta}+\hat{\upgamma}+\hat{\updelta}\right)\) being the 2nd episode component).
Performance measures
The main aim of this simulation study was to evaluate bias, though a secondary aim was to evaluate the coverage of 95% confidence intervals in settings where estimators were unbiased. We did not evaluate the precision of the different estimators, as each estimator addressed a different question and so precision is less relevant in deciding between them.
We measured bias as \(E\left(\hat{\upbeta}\right)\upbeta\), where \(E\left(\hat{\upbeta}\right)\) represents the mean of the estimates across all simulation replications, and β represents the true value of the estimand. We compared each estimator against its corresponding estimand (i.e. \({\hat{\upbeta}}_E^{AB}\) vs. \({\upbeta}_E^{AB}\), \({\hat{\upbeta}}_P^{AB}\) vs. \({\upbeta}_P^{AB}\), etc). We considered any estimator where the Monte Carlo standard error 95% confidence interval for bias (described below) did not include 0 (denoting unbiasedness) to be problematic.
We also evaluated coverage of the 95% confidence intervals. We defined coverage as the proportion of replications for which the 95% confidence interval of the estimator contained the true value of the estimand.
For each performance measure (bias, coverage) we also assessed the Monte Carlo standard error (MCSE), which provides a measure of variability for the estimated performance measure in the simulation study. We present the MCSEs as 95% confidence intervals alongside the mean bias and coverage, except in cases where this interval was too small to show up on the figure (i.e. when the width of the confidence interval was smaller than the size of the dot representing the mean bias or coverage), in which case we report the range of Monte Carlo standard errors for each performance measure across scenarios.
We used 10,000 replications for all simulation scenarios.
Simulation study 1: patients enrolled for all episodes they experience
Data generating methods
Simulation study 1 is based on a trial of 300 patients; 150 patients experience one episode during the trial period, and 150 experience two episodes (i.e. N_{T} = 300, M_{T} = 450, M_{T(1)} = 150, and M_{T(2)} = 150.
The main purpose of this simulation study is to evaluate estimators in the setting where patients are enrolled for all episodes they experience; that is, the 150 patients who experienced two episodes were enrolled in the trial for both episodes (i.e. there are no patients who do not reenrol for their 2nd episode).
We consider six different data generating mechanisms (described further below); all were based on the following general model for a continuous outcome:
where \({X}_{e{p}_{ij}}\) is an indicator variable for episode 2, and \({X}_{M_i}\) is an indicator variable for patients with M_{i} = 2. A description of the variables in this model are given in Table 3 (this table also contains some variables which are not in eq. (5), but are used in simulation studies 2a and 2b. Higher values of the outcome are better.
The parameter α is an intercept, β_{ep} and β_{M} are the effects of episode 2 and patient type (whether they experience 1 vs. 2 episodes) on outcome, and β_{trt}, β_{TRTxEP}, β_{TRTxM}, γ, and δ are components of the treatment effect (e.g. β_{TRTxEP} is the interaction between treatment allocation and episode number, β_{TRTxM} is the interaction between treatment and patient type, and δ is the interaction between treatment allocation in the current episode and allocation in the previous episode).
In this study, we considered six different treatment effect mechanisms. This involved varying the parameters that define the treatment effect (β_{trt}, β_{TRTxEP}, β_{TRTxM}, γ, and δ); values of these parameters for each scenario are shown in Table 4, along with the values of the four estimands for each scenario. For each scenario, we set α = 0, β_{trt} = 3, β_{ep} = 1, β_{M} = 1, \({\upsigma}_{\upmu}^2=5\) and \({\upsigma}_{\upvarepsilon}^2=5\). We generated μ_{i} and ε_{ij} independently; based on the chosen variances, the intraclass correlation between episodes from the same patient is 0.50 (conditional on the other variables in the data generating model).
We considered the following treatment effect mechanisms:

1.
Constant treatment effect: the treatment effect is the same (β_{trt}) across all episodes and patients

2.
Treatment effect varies across episode: the treatment effect is different in the 1st episode (β_{trt}) vs. in the 2nd episode (β_{trt} + β_{TRTxEP})

3.
Treatment effect varies across patients with different values of M_{i}: the treatment effect is different in patients who experience 1 episode (β_{trt}) vs. those who experience 2 episodes (β_{trt} + β_{TRTxM}).

4.
Treatment effect carries forward into the 2nd episode: patients who receive intervention in the first episode have better outcomes in their 2nd episode (by the amount γ)

5.
Treatment becomes less effective on reuse: patients receiving the intervention for the 1st time have a different treatment effect (β_{trt}) than those receiving the intervention for the 2nd time (β_{trt} + δ)

6.
Treatment effect varies across episodes, across patients with different values of M_{i}, carries forward, and becomes less effective on reuse: the treatment effect is β_{trt} for patients who experience one episode. For patients who experience two episodes, the treatment effect is β_{trt} + β_{TRTxM} in the 1st episode, β_{trt} + β_{TRTxM} + β_{TRTxEP} in the 2nd episode for patients receiving the intervention for the first time (i.e. who received control in their 1st episode), and β_{trt} + β_{TRTxM} + β_{TRTxEP} + δ in the 2nd episode for patients receiving the intervention for the 2nd time (i.e. received intervention in their 1st episode). Patients who receive the intervention in the first episode also have better outcomes in their 2nd episode, by the amount γ.
We note that under certain treatment effect mechanisms, the values of certain estimands will coincide. Briefly, the addedbenefit and policybenefit estimands will coincide when treatment history does not influence either the outcome or treatment effect in the current episode, and the perepisode and perpatient estimands will coincide when the cluster size is not informative [14,15,16,17,18,19], i.e. when a patient’s average treatment effect across episodes does not depend on the number of episodes they experience. For further details on when estimand values will coincide, see reference (1).
Simulation study 2a: some patients do not reenrol for their 2nd episode
Data generating methods
The main purpose of this simulation study is to evaluate estimators when some of the patients who experience two episodes do not reenrol in the trial for their second episode. For example, this may occur if patients find the trial procedures, such as number of followup visits, too burdensome; if they were disappointed at their treatment allocation in the first episode; or they experienced a poor outcome in their first episode.
As before, this simulation study is based on a trial of 300 patients; 150 patients experience one episode during the trial period, and 150 experience two episodes. All patients enrol for their first episode, but a subset of patients who experience two episodes do not reenrol for their second episode. Therefore, N_{T} = 300 and M_{T(1)} = 150, however M_{T(2)} < 150 and M_{T} < 450; the exact values of M_{T(2)} and M_{T} vary across simulation replications.
We simulated data by first generating outcomes for all 450 episodes (regardless of whether they were enrolled in the trial for their 2nd episode) using model [6] below, and then generated an indicator for each episode to denote whether it was enrolled in the trial or not using model [7] below. We then performed analysis only on the subset of enrolled episodes. We used six different treatment effect mechanisms (based on model [6] below) and five different nonenrolment mechanisms (based on model [7] below), leading to 6 × 5 = 30 total scenarios. The different treatment effect and nonenrolment scenarios are described below.
We generated continuous outcomes from the model:
This model is identical to model [5] from simulation study 1, except it contains two additional terms: \({X}_{P{L}_i}\) and \({X}_{E{L}_{ij}}\), which are unobserved binary covariates, with \({X}_{P{L}_i}\) being a patientlevel covariate which does not vary across episodes, and \({X}_{E{L}_{ij}}\) being an episodelevel covariate which can vary across episodes for the same patient; we use the subscript PL to denote ‘patientlevel’, and EL to denote ‘episodelevel’. The purpose of including \({X}_{P{L}_i}\) and \({X}_{E{L}_{ij}}\) in this model is to allow differential nonenrolment to be generated (this is explained further below). We used positive values for \({\upbeta}_{X_{PL}}\) and \({\upbeta}_{X_{EL}}\), so that patients or episodes where \({X}_{P{L}_i}=1\) or \({X}_{E{L}_{ij}}=1\) have better outcomes than if \({X}_{P{L}_i}\) or \({X}_{E{L}_{ij}}\) are 0; exact values of \({\upbeta}_{X_{PL}}\) and \({\upbeta}_{X_{EL}}\) for each scenario are shown in Table 5.
In the subset of patients with two episodes, we generated each patient’s probability of not reenrolling for the second episode on a linear scale using the following model:
where R_{ij} denotes whether patient i was enrolled for their j th episode (0 = not enrolled, 1 = enrolled). Note that R_{i2} = 0 for patients who only experience one episode, and R_{i1} = 1 for all patients. We use the superscript R_{2} for parameters to indicate that these parameters relate to the probability of not being reenrolled for the 2nd episode. We set \({\upalpha}^{R_2}=0.05\) and \({\upgamma}^{R_2}=0.10\) for all scenarios. This implies that all patients have a nonzero probability of not reenrolling for their second episode, and that patients who received intervention in episode 1 are more likely to not reenrol than those in the control group (irrespective of \({X}_{P{L}_i}\) and \({X}_{E{L}_{i2}}\)). Values for other parameters are shown in Table 5.
In the models above, we use \({X}_{P{L}_i}\) as a marker of the patient’s outcome in episode 1, and \({X}_{E{L}_{i2}}\) as a marker for the patient’s expected outcome in episode 2; that is, larger values of \({\upbeta}_{X_{PL}}^{R_2}\) denote that patients with better outcomes in episode 1 are less likely to reenrol in the trial for their 2nd episode, and larger values of \({\upbeta}_{X_{EL}}^{R_2}\) denote that patients with better expected outcomes in episode 2 are less likely to reenrol for that episode.
As stated above, we used six treatment effect mechanisms and five nonenrolment mechanisms. We used the same six treatment effect mechanisms as used in simulation study 1 (shown in Table 4), apart from the addition of \({X}_{P{L}_i}\) and \({X}_{E{L}_{ij}}\) to the model (as shown in model [6]). All other parameter values were the same as in Table 4, though the estimand values differed (these are described below). The five nonenrolment scenarios are shown in Table 5.
For each scenario, we calculated estimands based on the set of episodes enrolled in the trial. We calculated the true value of each estimand by generating a single large dataset of 1,000,000 patients (1,500,000 episodes), and then excluding episodes according to model [7] above. We then generated both addedbenefit and policybenefit potential treatment effects for each episode, and calculated the true value of the relevant estimand based on these. These estimand values are shown in the supplementary material.
Simulation study 2b: further exploring bias associated with perpatient and policybenefit estimators under nonenrolment scenarios 4 and 5
We developed simulation study 2b to further explore some of the results from simulation study 2a, pertaining to bias in the perpatient and policybenefit estimators under certain nonenrolment mechanisms. We generated outcomes and nonenrolment in the same way as in simulation study 2a, but used a wider range of parameter values in order to assess how large the relevant parameter values needed to be in order for bias to become apparent. Full details of the data generation methods, parameter values, and results are available in the supplementary material.
Results
Simulation study 1: patients enrolled for all episodes they experience
Results are shown in Fig. 1. All estimators were unbiased and provided close to nominal coverage in all scenarios.
Simulation study 2a: some patients do not reenrol for their 2nd episode
Results are shown in Figs. 2 and 3. The perepisode addedbenefit estimator was unbiased across all scenarios, and had close to nominal coverage. The perpatient and policybenefit estimators were unbiased across most scenarios, however, we identified several sources of bias which we discuss further below. Coverage of 95% confidence intervals was close to nominal for all settings in which estimators were unbiased.
The perpatient addedbenefit estimator was biased for nonenrolment scenario 4, where nonenrolment was differential across treatment groups based on previous outcome. We also identified a small bias in nonenrolment scenario 5 (where nonenrolment is differential across treatment groups based on prognosis at episode 2) under treatment effect scenarios 3 and 6 (when the size of the treatment effect varied across patients with different values of M_{i}). This bias was much smaller than that seen in nonenrolment scenario 4, but may still be large enough to cause concern.
The policybenefit estimators (both perpatient and perepisode) were biased in nonenrolment scenarios 4 and 5. This occurred despite the fact that these estimators correctly modelled the causal effect of the previous treatment allocation on the outcome and treatment effect. This bias was a result of the model providing biased estimates of the parameter γ in these scenarios (which represents the effect of the previous allocation on outcome); because this parameter is used to construct policybenefit estimates, these in turn will also be biased.
In these scenarios, episode 1 intervention patients with good outcomes were less likely to reenrol for episode 2. At episode 2 therefore, most patients with a good outcome would have been allocated control in the previous episode. This created a false association between previous treatment allocation and outcome, which led to biased estimates of γ.
Interestingly, the perpatient policybenefit estimator had negligible bias for nonenrolment scenario 4; this is likely because the perpatient estimator is biased upwards in this scenario, and the policybenefit estimator is biased downwards, and the two biases cancel each other to some degree; however, under different parameter values it is likely that one of the biases would overtake the other, and the estimator would be biased. This is explored further in simulation study 2b below.
Simulation study 2b: further exploring bias associated with perpatient and policybenefit estimators under nonenrolment scenarios 4 and 5
Full results are available in the supplementary material. Briefly, the policybenefit estimators were biased when either \({X}_{P{L}_i}\) or \({X}_{E{L}_{ij}}\) had strong associations with both outcome and probability of nonenrolment. When either association was small, bias was minimal, except when the other association was extremely large.
Similarly, the perpatient addedbenefit estimator was biased when \({X}_{P{L}_i}\) had a strong association with both outcome and probability of nonenrolment; when either of these associations were small, bias was negligible, except when the other association was extremely large.
Unlike in simulation study 2a, we found the perpatient policybenefit estimator was biased in certain settings, indicating that the two competing biases will not always cancel out.
As expected, the perepisode addedbenefit estimator was unbiased in all scenarios.
Discussion
In this article we report results from a large simulation study evaluating the use of independence estimators in rerandomisation trials. We found that the perepisode addedbenefit estimator was unbiased across all scenarios considered. The perpatient estimators and policybenefit estimators were also unbiased under the assumption of no differential nonenrolment (provided the causal model was correctly specified for the policybenefit estimator). Furthermore, we found that the use of a robust standard error provided close to nominal coverage in all settings where the estimator was unbiased. These results suggest that the rerandomisation design alongside an independence estimator is a potentially useful option to estimate relevant treatment effects in multiepisode settings, though if perpatient or policybenefit estimators are used it may be useful to conduct sensitivity analyses to evaluate how robust results are to violations of the above assumptions [20]. Furthermore, since most analyses of rerandomisation trials have used an independence perepisode addedbenefit estimator, this article provides reassurance that reported results from these trials are unbiased.
The results in this article are based on simulation, and so are limited to the specific simulation scenarios studied. We used a wide range of treatment effect mechanisms and nonenrolment scenarios, and the results agree with previous analytical results [1]. However, it is possible the results found here may not apply to other settings, for instance when the sample size is very small [4]. It would also be of interest to evaluate these methods in a reanalysis of a published rerandomisation trial. This design is still quite new, and there are few published trials in the literature. However, further methodological work showing the design can provide robust answers may lead to an increased uptake.
In this paper we focussed on the setting where the interventions under study do not affect whether future episodes occur. This is a plausible assumption for some trials (e.g. a trial of highdose Ibuprofen to manage pain in acute sickle cell pain crises [21], where Ibuprofen will not influence whether participants experience subsequent pain episodes), but will almost certainly be false in other trials (e.g. a trial of in vitro fertilisation, where a treatment success precludes further treatment episodes). In other settings this assumption will be unknown (e.g. in a trial using a drug to treat symptoms from severe asthma exacerbations, where it is unknown whether the underlying mechanism of the intervention may delay or even prevent subsequent exacerbations). If treatment does affect the occurrence of subsequent episodes, then the policybenefit and perpatient estimands defined in this paper are no longer valid and so estimation of these effects would lead to results with no clear interpretation. However, the perepisode addedbenefit estimand still applies to the setting where treatment affects subsequent episodes [1]. Therefore, if rerandomisation is being used in a setting where it is possible that treatment may affect occurrence of future episodes, we recommend using a perepisode addedbenefit treatment effect, to ensure results are valid and interpretable.
Further extensions to the work in this paper would be useful. In particular, it would be useful to compare rerandomisation to alternative designs. For instance, a cluster design, where participants are assigned to intervention for all episodes vs. control for all episodes, may be a more robust way to estimate policybenefit estimands. However, cluster designs may be affected by selection bias, where patients decide not to reenrol based on knowledge of which treatment group they are in. This is not an issue in rerandomisation trials, as they are randomised between treatments at each reenrolment (though of course, participants may not reenrol due to past treatments). Further work comparing these designs would therefore be useful.
It would also be useful to evaluate whether alternatives to independence estimators, such as mixedeffects models, are suitable. Previous research has found they can be biased when the treatment effect carries forward into subsequent episodes, however it is not known how well these models work under other treatment effect mechanisms, such as when the treatment effect varies across patients or across episodes. We only considered a setting where patients were enrolled for a maximum of two episodes; it would be useful to evaluate policybenefit estimators in the setting where patients experience a larger number of episodes, particularly as specifying an appropriate causal model in these settings will be more challenging. We found that robust standard errors worked well in the settings considered, however our primary concern was bias of estimated, and we designed the simulation study with this in mind. Followup simulation studies designed to specifically address the issue of robust vs modelbased standard errors would be useful. Finally, as discussed above, we only considered the setting where treatment allocation does not affect the occurrence of subsequent episodes; it would be of interest to extend the policybenefit and perpatient estimands to settings where this is not the case.
Conclusions
Careful choice of estimand can ensure rerandomisation trials are addressing clinically relevant questions. Independence estimators are a useful approach, and should be considered as the default estimator until the statistical properties of alternative estimators are thoroughly evaluated.
Availability of data and materials
The Stata code used to generate and analyse data in the simulation study is available in supplementary material file 2.
References
 1.
Kahan BC, White IR, Hooper R, Eldridge S. Rerandomisation trials in multiepisode settings: estimands and independence estimators. OSF (https://www.osfio/ujg46/). 2020.
 2.
Dunning AJ, Reeves J. Control of type 1 error in a hybrid complete twoperiod vaccine efficacy trial. Pharm Stat. 2014;13(6):397–402.
 3.
Kahan BC. Using rerandomization to increase the recruitment rate in clinical trials  an assessment of three clinical areas. Trials. 2016;17(1):595.
 4.
Kahan BC, Forbes AB, Dore CJ, Morris TP. A rerandomisation design for clinical trials. BMC Med Res Methodol. 2015;15:96.
 5.
Kahan BC, Morris TP, Harris E, Pearse R, Hooper R, Eldridge S. Rerandomization increased recruitment and provided similar treatment estimates as parallel designs in trials of febrile neutropenia. J Clin Epidemiol. 2018;97:14–9.
 6.
Nason M, Follmann D. Design and analysis of crossover trials for absorbing binary endpoints. Biometrics. 2010;66(3):958–65.
 7.
Morris CR, Kuypers FA, Lavrisha L, Ansari M, Sweeters N, Stewart M, et al. A randomized, placebocontrolled trial of arginine therapy for the treatment of children with sickle cell disease hospitalized with vasoocclusive pain episodes. Haematologica. 2013;98(9):1375–82.
 8.
Stokholm J, Chawes BL, Vissing NH, Bjarnadottir E, Pedersen TM, Vinding RK, et al. Azithromycin for episodes with asthmalike symptoms in young children aged 1−3 years: a randomised, doubleblind, placebocontrolled trial. Lancet Respir Med. 2016;4(1):19–26.
 9.
DiazGranados CA, Dunning AJ, Kimmel M, Kirby D, Treanor J, Collins A, et al. Efficacy of highdose versus standarddose influenza vaccine in older adults. N Engl J Med. 2014;371(7):635–45.
 10.
Bhide P, Srikantharajah A, Lanz D, Dodds J, Collins B, Zamora J, et al. TILT: timelapse imaging triala pragmatic, multiCentre, threearm randomised controlled trial to assess the clinical effectiveness and safety of timelapse imaging in in vitro fertilisation treatment. Trials. 2020;21(1):600.
 11.
Makrides M, Best K, Yelland L, McPhee A, Zhou S, Quinlivan J, et al. A randomized trial of prenatal n−3 fatty acid supplementation and preterm delivery. N Engl J Med. 2019;381(11):1035–45.
 12.
Morris TP, White IR, Crowther MJ. Using simulation studies to evaluate statistical methods. Stat Med. 2019;38(11):2074–102.
 13.
Wooldridge JM. Econometric analysis of cross section and panel data: the MIT press; 2010.
 14.
Huang Y, Leroux B. Informative cluster sizes for subclusterlevel covariates and weighted generalized estimating equations. Biometrics. 2011;67(3):843–51.
 15.
Seaman S, Pavlou M, Copas A. Review of methods for handling confounding by cluster and informative cluster size in clustered data. Stat Med. 2014;33(30):5371–87.
 16.
Seaman SR, Pavlou M, Copas AJ. Methods for observedcluster inference when cluster size is informative: a review and clarifications. Biometrics. 2014;70(2):449–56.
 17.
Sullivan Pepe M, Anderson GL. A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics  Simulation and Computation. 1994;23(4):939–51.
 18.
Williamson JM, Datta S, Satten GA. Marginal analyses of clustered data when cluster size is informative. Biometrics. 2003;59(1):36–42.
 19.
Yelland LN, Sullivan TR, Pavlou M, Seaman SR. Analysis of randomised trials including multiple births when birth size is informative. Paediatr Perinat Epidemiol. 2015;29(6):567–75.
 20.
Morris TP, Kahan BC, White IR. Choosing sensitivity analyses for randomised trials: principles. BMC Med Res Methodol. 2014;14:11.
 21.
Cho G, Anie KA, Buckton J, Kiilu P, Layton M, Alexander L, et al. SWIM (sickle with ibuprofen and morphine) randomised controlled trial fails to recruit: lessons learnt. BMJ Open. 2016;6(6):e011276.
Acknowledgements
Not applicable
Funding
Brennan Kahan is funded by a National Institute for Health Research Doctoral Research Fellowship (DRF201609005). This article presents independent research funded by the National Institute for Health Research (NIHR). The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health. Ian White is grateful for support from the UK Medical Research Council, grants MC_UU_12023/21 and MC_UU_12023/29.
Author information
Affiliations
Contributions
BCK: design and conduct of simulation study, wrote first draft of the manuscript. IRW, SE, RH: input into the design of the simulation study, edited the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Kahan, B.C., White, I.R., Eldridge, S. et al. Independence estimators for rerandomisation trials in multiepisode settings: a simulation study. BMC Med Res Methodol 21, 235 (2021). https://0doiorg.brum.beds.ac.uk/10.1186/s12874021014334
Received:
Accepted:
Published:
DOI: https://0doiorg.brum.beds.ac.uk/10.1186/s12874021014334
Keywords
 Rerandomisation
 Rerandomisation trials
 Independence estimators
 Simulation study
 Estimands