Evidence in Medicine. Iain K. CrombieЧитать онлайн книгу.
contract the disease naturally. James Jurin evaluated this in the 1720s, by collecting data on death rates in three groups: those who were diagnosed with smallpox, those at risk of contracting smallpox and those who had been variolated [22, 23]. The results appeared convincing with death rates of 16.5% (diagnosed cases), 8.3% (at risk) and 2.0% (variolated) [23]. Preventing smallpox was a much safer practice than letting nature take its course.
Death following childbirth was a serious concern in the seventeenth to nineteenth centuries, causing epidemics ‘of unimaginable proportions’ [24]. A major cause of this mortality, puerperal fever (fever following childbirth), was investigated by Ignaz Semmelweis, a Hungarian doctor. In 1844 he compared the death rates among patients in two wards of a hospital in Vienna. He found that the death rates in a ward staffed by doctors was much higher (16%) than in the one run by midwives (2%) [25]. This, and other observations, led Semmelweis to conclude that the illness was transmitted by doctors coming directly from a post‐mortem to help deliver a baby. He initiated a preventive measure, compulsory hand washing in a chloride of lime solution, which reduced the mortality in the doctors’ ward to 3% [25]. His approach was not popular, because it implied that doctors transmitted disease, and Semmelweis's contract was not renewed. He was finally vindicated some 30 years later when Pasteur identified the bacterium, Streptococcus pyogenes, that caused puerperal fever [25].
These treatment evaluations utilised two different types of comparisons: contemporary controls and historical controls. Contemporary controls are patients who were seen at the same time as those getting the new treatment, but who received the conventional care. Historical controls are patients who had been treated previously in the same location (e.g. hospital). Jurin's comparisons of groups at risk of smallpox, and Semmelweis's comparison of puerperal fever in two wards, used contemporary controls. In contrast the comparison of puerperal fever before and after introducing handwashing, and Paré's comparison of treatments for gunshot wounds, used historical control groups.
The problem with both types of control groups is that there could be systematic differences between the patients in the different groups. Isaac Massey, a contemporary of Jurin, made this criticism of the work on smallpox, pointing out that those who could afford variolation may have been in better health than those in the comparison groups [22]. He concluded that what was needed was groups that were similar, they ‘must and ought to be as near as may be on a Par’ [22].
COMPARING SIMILAR GROUPS
When groups are similar at baseline, it is more likely that any differences in subsequent outcomes will be due to the differences in the effects of the treatments. The idea of comparing like with like was proposed in the fourteenth century by the poet Francisco Petrarch, who suggested using similar groups of patients to compare the then current treatments with simply letting nature take its course [26].
One way to create similar groups is to recruit a number of patients who are all alike, then give them different treatments. The testing of potential treatments for scurvy is a widely cited example of the benefit of using similar groups. Scurvy is a debilitating and sometimes fatal disease, which afflicted sailors on long‐distance sea voyages from the fifteenth to the nineteenth centuries [27, 28]. By the late 1500s, the benefits of consuming oranges and lemons were well known by Dutch sailors [27], but many English expeditions continued to suffer serious loss of life through scurvy [28]. The issue was still unresolved in 1747 when James Lind, a Royal Navy surgeon, carried out a classic study to assess the effects of six common treatments. He identified 12 sailors with scurvy who were ‘as similar as I could have them’, and tested each of the treatments on groups of 2 men (each pair to receive either: oil of vitriol, vinegar, sea water, cider, oranges and lemons, or a herbal paste) [29]. After 14 days Lind observed ‘the most sudden and visible good effects were perceived from the use of oranges and lemons’. These findings were not widely accepted, and even Lind had doubts about them [29, 30], but the method used reflects an advance in thinking about ways to test treatments. Lind is rightly celebrated for his comparison of like with like in the evaluation of treatments. (In his ‘Treatise of the Scurvy’ Lind does not make any clear recommendations for the treatment of the disease, possibly because he believed that scurvy was not due to poor diet, but was a result of faulty digestion exacerbated by wet weather [29, 30].)
Another study in the eighteenth century used similar groups to assess whether the adverse effects of variolation (to prevent smallpox) could be ameliorated by pretreatment with a compound of mercury. At that time about 1 out of 50 patients vaccinated against smallpox died following the procedure [31]. In 1767 William Watson recruited 31 children who were similar in age, gender and diet [32]. These were divided into three groups, which received either the mercury mixture, a mild senna laxative or no treatment. No clear difference was found between the groups, using an objective measure of assessment (the number of pock marks caused by the variolation). Watson concluded that variolation against smallpox was effective with or without pretreatment with mercury or a mild laxative [32].
CASTING LOTS AND TREATMENT ALLOCATION
Comparing similar groups of patients was an important step forward in the evaluation of treatments, but it leaves open the possibility that the groups may have differed on important factors that were not measured. Further, a subconscious bias in the doctor allocating patients to treatments could influence the way individuals were assigned to groups (e.g. the slightly sicker ones might be preferentially assigned to one group). An alternative approach, which prevents this bias, is to allocate individuals to treatments in a truly random way, so that the final groups will be balanced on all factors, whether measured or not.
The idea that some form of randomisation should be used to allocate patients to treatment groups was proposed in the 1640s. Joan Baptista van Helmont, a Flemish chemist, alchemist and physician, recommended this method to evaluate the effectiveness of bloodletting [33]. He suggested dividing up to 500 patients into 2 groups, then casting lots (equivalent to tossing a coin) to decide which group would be given the conventional therapy (bloodletting) and which would receive van Helmont's own treatment. A notable feature of the trial design is that the outcome would be decided by the number of funerals that occurred in the two groups. The experiment was not carried out. (The proposed use of an objective outcome measure such as this is unusual for its time.)
One method of randomised allocation was used in 1848 by Thomas Graham Balfour to investigate whether homeopathic belladonna could prevent scarlet fever. Balfour identified 151 boys who had not had the disease, and ‘divided them into two sections, taking them alternately from the list, to prevent the imputation of selection’ [34]. Balfour recognised that if he had to decide which boys were allocated to each group, his choices might be biased. (Alternate selection from a list is essentially a method of randomisation, as the factors which are related to dying from scarlet fever, will be randomly scattered throughout the list.) The study showed that exactly two children in each group developed scarlet fever, leading him to conclude that ‘the numbers are too small to justify deductions as to the prophylactic power of belladonna’ [34], a commendably careful interpretation of the findings.
Instead of alternate selection from a list, patients could be allocated to treatments by the date of their admission to hospital. This method was used by the Danish physician Johannes Fibiger in 1896–1897 [35] to evaluate the effectiveness of a serum treatment for diphtheria. Thus, patients admitted to hospital on one day received serum and those on the next day were untreated. The outcome was persuasive: only 8 of 239 patients in the serum group died, compared to 30 of the 245 controls.
The use of alternate allocation began to gain popularity in the first few decades of the twentieth century because it prevented bias in the assignment of patients to treatments. These research studies were conducted in both the United States and the UK, with patients being randomised by the order of their attendance at a healthcare facility [36–39]. These trials signalled the growing recognition of the importance of achieving comparable groups.
RANDOM NUMBERS FOR TREATMENT ALLOCATION
A landmark series of