Skip to main content


Table 3 Brief description of the 16 approaches for quantitative benefit-harm assessment

From: A framework for organizing and selecting quantitative approaches for benefit-harm assessment

Benefit-less-risk analysis (BLRA) Researchers developed benefit-less-risk analysis, which combines benefit and harm into a single metric, primarily for clinical trials [36]. This analysis takes advantage of individual patient data. For each patient, researchers record the benefit (yes or no) and express the harm as a value between 0 and 1. Researchers present the relationship between benefit and risk as risk subtracted from benefit, which allows for statistical testing of comparisons between treatment groups. Patient preferences expressing the relative importance of benefit and harm outcomes can be considered. Benefit-less-risk analysis is a method that takes advantage of individual patient data. Thus, if researchers applied this method in a systematic review, they would need to gather individual patient data from the primary studies.
Boers’ 3x3 table This quantitative approach does not require any statistical models but suggests a way of organizing outcome data on the same scale [23]. Researchers need individual patient data. They split the outcomes of patients into three categories and display the number of patients with a certain benefit-harm profile (e.g. major benefit and minimal harm) in a 3x3 table. Researchers do not consider treatment effects directly since they construct separate 3x3 tables for each treatment group. As a consequence, no measures of uncertainty are available. Researchers do not consider patient preferences, but instead the clinicians’ view or agreement of what constitutes minimal, moderate, or major benefit or harm. The method is both feasible for single trials or systematic reviews. A disadvantage is that, although each table is simple and easy to read, it requires readers to somehow estimate treatment effects across tables or to provide a benefit and harm comparison metric, and thus challenges rather than facilitates conclusions concerning benefits and harms.
Gail/National Cancer Institute This is one of the most comprehensive approaches for benefit and harm assessment and considers various data sources to balance the benefits and harms of a treatment [3]. As described above, researchers can calculate the benefit and harm comparison metric as the sum of benefit and harm outcome rates per patient profile. They can incorporate patient preferences by looking only at one severity grade, or by putting weights on outcome rates that reflect patient perception of events: very severe, severe, or moderately severe. The approach does not consider the joint distribution of benefit and harm outcomes but could potentially be extended to do so. However, by looking at benefit and harm comparison estimates across patient profiles, one gets an impression of how the net benefit changes, even qualitatively, as the baseline risk changes. This approach is resource intensive because it considers multiple data sources and multiple outcomes. The United States Preventive Services Taskforce used a similar though simplified approach to make recommendations on the use of aspirin for the prevention of myocardial infarction [1]. Similar to the tamoxifen example, researchers estimated the number of benefit (myocardial infarction) and harm (bleedings) events per 1,000 men or women based on observational data and the evidence on treatment benefits and combined harms with these outcome estimates. The benefit-harm comparison metric provided the number of net events (benefit minus harm) prevented or in excess when aspirin is used [1].
Incremental net health benefit Incremental net health benefit provides a benefit and harm comparison metric, using QALYs to place one or more benefits and harms on the same scale, and calculates the difference between benefit and harm between treatments (thus a result >0 is favorable) [24, 25]. A key requirement for this approach is the valid measurement of utilities or the sometimes inaccurate transformation of quality of life scores into utilities. Also, it is often difficult to distinguish between the effects of benefits and harms when using utilities.
Multicriteria decision analysis (MCDA) A multi-criteria decision analysis allows for a systematic decisionmaking in complex situations involving tradeoffs, by considering various harms and benefits associated with treatments [16, 17]. Researchers develop a decision tree model to incorporate benefits from clinical trials and harms such as adverse effects. It allows for input from various stakeholders who may assign different preference weights to the risks and benefits. MCDA represents an approach to reduce the multidimensionality of benefit-harm assessment in a systematic way and makes judgments explicit and transparent. It allows for decisionmaking in the presence of uncertainty and can incorporate data from multiple sources including systematic reviews. The challenges of its application to systematic reviews include getting reliable information on various preferences, agreement on all relevant important benefits and harms and the relative importance and weighting of these outcomes, and the need to specify a decision context since systematic reviews are usually conducted to meet the needs of multiple decisionmakers. The flexibility of MCDA also poses challenges for benefit-harm assessment as systematic reviews are unable to inform on all inputs, especially less tangible inputs (e.g. societal values, opportunity costs) that may alter harm and benefit balance in a particular decision context.
Minimum clinical efficacy Minimum clinical efficacy incorporates harm and benefit into a benefit and harm comparison metric. The benefit is the difference in efficacy and harm, both of which are expressed on a probability scale by applying relative risks reductions (treatment benefits) and increases (harms) to absolute probabilities as observed in untreated groups [6, 27]. Researchers consider the intervention as having minimal clinically efficacy if the difference between benefit and harm is positive or above a minimally acceptably threshold. Minimum clinical efficacy can consider relative utilities. A limitation includes the inability to provide uncertainty estimates for the benefit and harm comparison metric.
Net clinical benefit Similar to the Gail/National Cancer Institute approach, the calculation of the net clinical benefit considers different data sources such as randomized trials, observational studies, and patient preferences, and provides profile-specific benefit-harm comparison estimates [28]. Researchers calculate the benefit-harm comparison metric as the sum of all expected benefits minus the sum of all expected harms. They calculate the benefit from the pooled relative risk reductions (based on meta-analysis) that they apply to patients at different risk for the benefit outcome (e.g. stroke). They calculate the expected harm from the risks for the harm outcome, and the patient preferences for the harm outcome. They calculate net clinical benefit using a Bayesian approach where they model all steps simultaneously (meta-analysis, calculation of expected benefit, and expected harm). A major advantage of this approach is its flexibility to combine different data sources and place distributions on each parameter. Thereby researchers can qualify uncertainty around the parameters. Net clinical benefit considers patient preferences for different outcomes, but similar to other approaches, the selection of particular values for preferences has a large impact on the net clinical benefit estimates. In the Figure, we categorized the approach as considering only single benefit and harm outcomes because published applications of the approach considered only one benefit and one harm outcome. But the approach offers, theoretically, enough flexibility to consider multiple outcomes.
Number needed To treat (NNT) And number needed To treat To harm (NNH) The (NNT) NNH) refer to the number of individuals who need to be treated over a specified period of time with the intervention for one person to benefit and experience the harm [12, 13].NNT and NNH depend on baseline risk (and are thus sensitive to different patient profiles) and the degree of relative risk reduction provided by the intervention (which researchers often assume is constant across the disease spectrum, but in fact may actually vary). Researchers can calculate NNT and NNH for single outcomes (e.g. NNT for exacerbations vs. NNH for fractures) or for composite outcomes for both benefit and harm. But since the concept of NNT is one of frequency and not of importance, researchers should only calculate the NNT and NNH ratios or differences for outcomes of similar importance [37]. When researchers calculate a ratio or difference between NNT and NNH as a benefit and harm comparison metric, researchers assume their independence and may need to extrapolate so that the ratios refer to the same time period. Researchers cannot calculate NNT and NNH for continuous outcomes unless such outcomes are dichotomized. NNT and NNH is perhaps the most widely used measure of risk and benefit reported in systematic reviews and evidence-based medicine. Extensions of the NNT/NNH ratio approach include: the threshold NNT, the minimum target event risk for treatment (MERT), and the subject-year adjusted NNT [8]. The threshold NNT reflects the point at which the risks and costs of a clinical intervention balance the benefit, and the minimum target event risk for treatment defines the minimum target event risk at which the intervention is justified. Subject-year adjusted NNT uses subject years as the denominator instead of participants, to better account for time-on-treatment for participants. For example, if there are two events per 1,000 subject years in the control group and one event per 1,000 subject years in the intervention group, the NNT is 1,000 subject years. This means that with treatment, one fewer event would occur with every 1,000 subject-years. Methods for providing uncertainty for these benefit and harm comparison metrics are available [8]. The NNT/NNH, threshold NNT and MERT all seem feasible within a systematic review context.
Probabilistic simulation methods (PSM) The PSM use probabilistic simulations for benefit and harm comparison estimates using Monte Carlo methods. PSM can incorporate parameters from multiple data sources (systematic reviews of randomized clinical trials and observational studies), patient preferences (e.g. from conjoint analysis) and different patient profiles [7, 29, 32].This method estimates uncertainty around the benefit and harm comparison estimate, with or without consideration of the joint distribution of benefits and harms (depending on the availability of individual-level data or reporting of covariances).
Quantitative framework for risk and benefit assessment (QFRBA) QFRBA reports on benefit and harm separately. It does not provide a benefit and harm comparison metric and uncertainty estimates for benefit or harm outcomes are only available for the separate treatment effects [5]. An advantage of this method is that keeps benefit and harm separate, leaves room for incorporation of preferences by decisionmakers and consideration of multiple outcomes. Also, QFRBA is probably the way most systematic reviews currently report or discuss the benefit and harm assessment.
(Quality-adjusted) time without symptoms and toxicity (TWiST or Q-TWiST) TWiST compares treatments in terms of the time without symptoms gained versus the time lost due to the experience of adverse effects [14, 15]. It therefor puts the benefit and harm on the same scale (time). Q-TWiST is a further development where time is converted into QALYs [30]. Here the benefit and harm comparison metric is the difference between the drug associated gain in QALYs and the loss in QALYs associated with the treatment due to adverse effects. Numerous oncology studies have used Q-TwiST. The major advantage of this method is the ability to incorporate patient preferences, which may change over time. The method depends heavily on the availability of measurements that allow estimating the length of time periods without symptoms and of time periods where adverse effects are experienced. Also, measurement instruments need to be highly specific so that a distinction between benefit and harm is possible. For example, quality of life and some preference-based instruments often provide a composite score that already synthesizes the overall experience of a patient. Also, QALYs itself values health states rather than changes in health states, and lack of a measure of uncertainty around these measurements may limit the usefulness of this method. In a systematic review, this method may be difficult to apply since QALYs associated with benefit and harm are unlikely to be reported in reports of primary studies.
Risk–benefit Contour (RBC) The risk–benefit contour plot is a graphical method to assess benefits and harms [31]. It portrays the probability of benefit for a new treatment compared to another treatment against the probability of harm for that new treatment (again as compared to another treatment). Contour lines portray the shape of this relationship for a number of different probabilities and confidence levels. The risk-benefit contour plot is a way to express uncertainty associated with certain pairs of benefit and harm. The plot conveys study-level relationships, and does not consider the dependence of the probability of benefit and harm at the individual level. Though the method does not incorporate weights (representing patient preferences) for each type of outcome, researchers could adapt it to do so. Researchers should probably view risk–benefit contour as a way to present data and visualize uncertainty. This way researchers can base the underlying analyses that yield the probability estimates on different statistical approaches such as various forms of Probabilistic Simulation Methods (PSM).
Risk–benefit plane (RBP) and risk–benefit acceptability threshold (RBAT) The RBP and RBAT display in a simple figure, both separate estimates of benefit and harm and a benefit and harm comparison metric [29, 32]. This method does not consider the individual-level dependence between benefit and harm. Using an absolute scale, this method plots the probability of benefit (from a comparison between two treatments) against the probability of harm. With this method, researchers refer to the slope created by a line between the origin and the two-dimensional result as the risk-benefit acceptability threshold. This method does not consider outcome weights that would reflect patient preferences.
Relative value adjusted number needed To treat (RV-NNT) The major advantage of RV –NNT over NNT and NNH is that it allows for incorporation of preferences into the assessment of benefit and harm [6, 27]. Otherwise it offers the same advantages as the NNT/NNH ratio approach, and suffers from some the same limitations. Systematic reviews would need information on preferences to incorporate this method.
Stated preference method and maximum acceptable risk Researchers use SPM and MAR are used to survey patients on how much burden from adverse effects, or serious adverse events, they are willing to accept in order to experience the benefits of treatment [33, 35, 38, 39]. Researchers need individual patient data for these approaches. The typical method to elicit preferences is discrete choice or conjoint analysis, where respondents have to pick their preferred treatment from two treatment scenarios that characterize the benefit and harm of these treatments. These approaches assume that the attractiveness of a particular treatment is a function of the benefit and harm attributes, which are combined in various ways in the different vignettes of the survey [34].
Transparent uniform risk benefit overview (TURBO) The TURBO diagram displays the factors R and B. R is the sum of the most serious adverse effect (scored from 1–5) and the second most serious adverse effect (scored from 1–2) [11]. Researchers base the scores on the frequency and severity of the harm outcome. Similarly, factor “B” is the sum of the primary benefit (1–5), and the ancillary benefit (1–2) and researchers base the scores on the probability and extent of the benefit outcome. The T score represents the benefit and harm comparison metric and ranges from 1 (high R and low B score) to 7 (high B and low R score).Researchers typically use the TURBO diagram in a regulatory context (e.g. European Medicines Agency) and therefore, they base them on single trials, but researchers can also base them on systematic reviews. Researchers can base the factors R and B on absolute or relative measures of treatment effects for which uncertainty estimates are available. But there is no uncertainty estimates for the benefit and harm comparison metric (i.e. the “T” score). Unlike other approaches, the TURBO diagram explicitly considers not only one but two outcomes for both benefit and harm that are weighted differently (up to 2 or 5 points). Challenges to applying the TURBO method include arbitrary selection of the two benefit and harm outcomes from a comprehensive list of outcomes and the way researchers assign scores (combining frequency and importance of outcomes).