Evidence in Medicine. Iain K. CrombieЧитать онлайн книгу.
with missing outcome data [47]. Thus it is often not possible to assess whether those lost to follow‐up in the intervention group differ from those lost in the control group.
Bias from Loss to Follow‐up
The impact of loss to follow‐up is difficult to predict. A review found that loss to follow‐up was sometimes higher in the treatment group and sometimes in the comparator [47]. Possibly the effect of loss to follow‐up depends on the specific trial characteristics, such as the illness being treated or the acceptability of the treatment to patients. Whatever the explanation, concern remains that some trials may be biased due to patient attrition.
MISSING OUTCOME DATA AND INTENTION TO TREAT
A difficult issue for the analysis of a trial is how to cope with those who are lost to follow‐up. The accepted approach to this problem is termed ‘intention to treat’. This holds that all patients should be analysed in the groups to which they were randomised, no matter what subsequently happened to them. Intention to treat is regarded as ‘a key defence against bias’ in clinical trials [52].
A popular technique to handle missing data is complete case analysis, which restricts the analysis to those successfully followed up. This approach was used in 26% of trials on musculoskeletal disorders [53], in 45%–54% of studies in general medicine [45, 54] and in 60% of trials in palliative care [51]. Despite its popularity, complete case analysis clearly contravenes the principle of intention to treat, effectively losing the benefit of randomisation. This method is likely to introduce bias, because the number and types of patients lost to follow‐up are unlikely to be the same in the intervention and control groups [55].
Methods of Imputation
Instead of ignoring the problem, as complete case analysis does, a better approach is to use a method to estimate what the missing value might be. This is termed imputation. The simplest form of imputation is last observation carried forward. If trials measure the outcome at more than one time point, the most recent observation is used in the analysis; if there are no intermediate measures the baseline measurement is used. This method is often used in trials [45, 48], but it has been heavily criticised because it is likely to lead to biased estimates of treatment effects [55–57]. As disease severity often changes over time, with relapses or remissions, early values can be poor predictors of the final outcome.
Multiple imputation is a more sophisticated approach to estimate the missing outcome data. It uses statistical modelling to derive estimates of the missing data based on the available data (imputation). The modelling includes the variables that would normally be used in the final analysis, such as the stratification variables (e.g. centre, gender), treatment group and potential confounders (e.g. initial disease severity, other medical conditions). It can also include other variables (auxiliary variables) that might be available [58, 59]. The idea is that patient characteristics, both medical and demographic, could predict the final outcome. The modelling process is repeated many times to produce an average result (hence multiple imputation).
Multiple imputation makes an assumption about the nature of the missingness of the data. Termed ‘missing at random’, the assumption is that the missing outcomes can be predicted from the other data in the study [60]. It is recommended that sensitivity analysis should be used to explore the effect of assumptions about the missing data [59]. Although not a perfect solution, multiple imputation is better than other methods of dealing with missing data (such as complete case analysis or last observation carried forward) [58].
Modified Intention to Treat
The term modified intention to treat (mITT) is commonly used to describe the analysis of trial data [61, 62]. It has no formal definition [63], but usually involves the deliberate exclusion of some participants from the analysis at some time after randomisation. Patients can be excluded for several reasons: the results of the baseline assessment; results of a post‐baseline assessment; the amount of treatment received; or failure to obtain the outcome measures [63]. Individual trials could employ one or more of these reasons to exclude patients. The impact of mITT on estimates of treatment benefit varies: one review found that, compared to intention to treat (ITT) analyses, the modified method inflated treatment effects [64], whereas another study found no difference between ITT and mITT [65]. The practice of excluding patients after randomisation has been widely criticised because of its potential to introduce bias [62, 66, 67].
OTHER METHODOLOGICAL CONCERNS
Unregistered Trials and Bias
Trial registration ‘was introduced in an effort to reduce publication bias and raise the quality of clinical research’ [68]. Although registration is strongly recommended, a recent study showed that only 53% of trials had had done so [69]. An analysis of over 1,100 trials explored the factors associated with registration. Compared to registered studies, trials that are unregistered are more likely to be of lower methodological quality. For example, they are less likely to have a defined primary outcome (48% vs 88%), more likely to have not reported or inadequate allocation concealment (76% vs 55%), less likely to report whether or not blinded (32% vs 15%), and more likely not to report details of attrition (67% vs 29%) [70]. When adjusted for methodological weaknesses, the unregistered trials had a modest increase in the average effect size compared to the registered studies. Another study evaluated 322 trials, and also showed a similar modest effect on treatment effect estimates [71]. Unregistered trials may give biased estimates of treatment effect.
Small Studies
Small trials often give misleading estimates of treatment benefit. Several review studies, each of which examined hundreds of trials, have shown that, on average, small trials report greater effect sizes than larger ones [72–74]. Two explanations have been suggested for this finding: small studies with negative findings may be less likely to be published, and small studies may be of poorer methodological quality and more prone to bias [73]. Most likely both factors contribute to the bias.
A related phenomenon is that unusually large treatment effects are most commonly reported by small trials [75]. These often occur in the first trial of a new treatment, with subsequent trials showing much smaller effects [76–78]. A possible explanation for this is that small studies are much more likely to be influenced by the play of chance [79]. A few more events (e.g. deaths) in one treatment group, or a few less in the other, can have a large effect on small studies. When averaged across many trials, chance effects cancel out, but for an individual study it can generate large, misleading effect sizes.
The fragility index is used to identify just how susceptible statistically significant results are to the play of chance [80]. It measures how many fewer events would have to occur to change a significant treatment effect to a non‐significant one. Reviews have found that for many trials the index is one i.e. if one patient had a different outcome the finding would not be statistically significant [81, 82]. In general, the smaller the value of the index, the more fragile the study. Several reviews of trials have reported the median values of the index of 1, 2, 3 and 4 [82–84], indicating that, for half of the trials included in these reviews, a different outcome in a few patients would change the statistical significance. Other reviews have found slightly larger median fragility indices of 5 and 8 [80, 85].
Low Power
Small studies are often referred to as having low power. In medical research, statistical power refers to the chances (probability) that a study will detect a significant effect of treatment if one truly exists. A power of 80% is recommended, but few trials in medicine achieve this: in an overview of 136,000 trials only 9% of those published between 2010 and 2014 did so [86].
A consequence of low power is that spuriously significant results are more likely to occur [87]. Another problem is that, if there is a real benefit of treatment, small studies are unlikely to detect the benefit as being