Matches in SemOpenAlex for { <https://semopenalex.org/work/W2045196652> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W2045196652 endingPage "583" @default.
- W2045196652 startingPage "578" @default.
- W2045196652 abstract "“...When something possible has not yet happened, people may tend to expect that it cannot happen in the future.” [1] In this issue of the journal. Alexiev et al. provide the first observational report of a relatively new supraglottic airway [2] which, in their opinion, demonstrated a level of utility worthy of further investigation. This accompanying editorial argues instead that the correct conclusion from their data should be that the device showed a lack of utility (albeit one which merits further study). Some readers may be surprised by this alternative suggestion, especially as the authors found a near-zero incidence of insertion failure. How can it be correct to conclude that something is likely to be ineffective when nearly every instance of its use in a trial results in success? In anaesthetic research it is not unusual to see observational studies rather than randomised controlled trials (RCTs). These are an extension of a single case report, in which there is no parallel control group for the intervention of interest [2–8]. Strictly, there is no ‘hypothesis test’; an observational trial is simply a descriptive report of what happened. Trial endpoints can have continuous values, such as leak pressure in cmH2O, or categorical values, such as ‘success’ vs ‘failure’ (in this case also termed ‘binomial’ as the outcome has only one of two values). By predefining a certain threshold, continuous variables can be converted to binomial categorical data (e.g. leak pressure > 20 cmH2O is ‘success’; < 20 cmH2O is ‘failure’). In the context of RCTs, the issue of appropriate sample sizes, the notion of the ‘power’ and the interpretation of p values have all been well rehearsed [9]. But this is less so for observational trials and several interlinked questions arise. First, what is the acceptable failure rate for the device or intervention in question? For example, for a supraglottic airway device (SAD), should we consider as acceptable failure rates of 2%, 10%, 20%, etc? Second, what is the appropriate number of patients to include in an observational study? Third, how should we interpret a result where we find no ‘failures’ at all in using the device in a trial? The answers to these questions in fact already exist in numerous other publications [1, 10–12], but the purpose of this editorial is to bring together these conclusions to provide more specific guidance to anaesthesia researchers wishing to conduct or report observational studies. “...a confidence interval may be constructed easily from a zero numerator...” [1] The key data in an observational study like that of Alexiev et al. are often presented as ‘point estimates’, i.e. the specific value that is actually obtained. If we see x failures in n patients, the point estimate failure rate is x/n (expressed as %). However, as practitioners we are more interested in the broader range of expected failure rates and especially in how badly the device might perform (‘the worst we can expect’). The relevant statistic is what we call the upper limit of the 95% confidence interval (CI). We choose the 95% CIs because by convention, we regard 5% as the level of ‘significance’ for most hypothesis tests. The lower limit for the 95% CI (‘the best we can expect’) is irrelevant because we know that, at its best, any device might perform perfectly. An analogy may be made with an extreme scenario, like assessing whether keeping one’s eyes shut when crossing a road is a suitable strategy. The lower limit for the 95% CI for death may in fact turn out to be quite impressively close to a zero death rate because it is quite possible to reach the other side of even busy roads, unharmed. The reason safety experts advise against doing this is because the worst we can expect (the upper limit of the 95% CI) is very bad, and they properly disregard the best we can expect (the encouraging lower limit of the 95% CI). There are well-established mathematical formulae for estimating the CIs for binomial data from point estimates (Appendix 1) and they rely upon: (a) the prevailing success/failure rate; and (b) the number of patients studied. Figure 1 uses these binomial CI calculations to show how the upper limit of the 95% CI varies with the number of observations made, given certain prevailing failure rates. The black curve represents a 50% failure rate (akin to tossing a coin, where ‘heads’ is arbitrarily regarded as failure). We can readily see that if we toss only a few coins, the resulting proportion of heads can be much higher than an expected value of 50% but as we toss more and more coins, the resulting proportion more closely approximates 50%. The red and green curves are for hypothetical failure rates of 5% and 2.5%, respectively, and follow the same broad pattern, demonstrating that as the number of observations increases, the upper limit of 95% CI approximates the failure rate itself. Indeed, for all these failure rates it would seem that the upper limit of 95% CI does not change appreciably above sample sizes of ∼50. Or expressed another way: for sample sizes below 50, the CI is very sensitive to the sample size. Graph showing how the upper limit of 95% CI varies for a binomial outcome with the number of observations made (or patients studied) for different prevailing failure rates (black: 50% failure; red: 5% failure; green: 2.5%). Setting acceptable failure rates is always an arbitrary and subjective process, often a result of professional consensus. But given that we know the resusable classic laryngeal mask airway has a failure rate of just 1% (or less) [13], then it would seem appropriate to suggest that for any new airway management device designed to replace it, the upper 95% CI failure rate should (liberally) be no more than ∼2.5%. Some even more ‘liberal’ practitioners may suggest that failure rates up to 5% for the worst we can expect are acceptable. In contrast, ‘stricter’ practitioners might reasonably insist on maximum figures of just 1–2%. None of this changes the broad thrust of the argument below, because very few colleagues would accept upper failure rates as high as > 10% for novel airway devices [14]. Replotting Fig. 1 so that failure rate is now on the x-axis (Fig. 2) shows how the 95% CI upper limit varies with sample size. Note that the upper 95% CI is not identical for equivalent failure rates. Rather, it is lower as more patients are studied, because larger trials yield more precise estimates with tighter CIs. Therefore, when reporting point estimates for failure (or success) rates, observational trials should also report the 95% CI upper limit using binomial statistics. This would appropriately reflect the precision of the point estimate of the observational trial. Graph showing how the relationship between the 95% CI upper limit and the prevailing failure rate for a binomial outcome is itself dependent upon the number of observations made, or patients studied (red: sample size of 10; black: sample of 30; green: sample of 300). Thus for any given failure rate, the 95% CI upper limit is lower for a larger sample size. In the intersection of lines at the y-axis, Fig. 2 also hints at the answer to another question posed earlier: how to interpret trials in which there are no failures. There are many such ‘zero failure’ trials reported in the anaesthetic literature [7, 8, 15]. This question has been discussed extensively before [1, 10, 11]. In Fig. 3 the 95% CI upper limit for failure rate is plotted against the number of observations where there are no failures (green line). It is clear from this that to be confident of a failure rate < 2.5% (i.e. our predefined threshold for acceptability), our sample size must exceed 150 when we see no failures. Or in other words, even a zero failure rate in any observational trial of < 150 patients fails to exclude the hypothesis that the device performs poorly. Note that where we report even one failure, our required sample size increases to > 200 (red line in Fig. 3). Upper limit for 95% CI as a function of the number of observations made for a binomial outcome, when there are zero (green), one (red) and 10 (black) failures. The horizontal dashed line indicates the acceptable upper limit for failure rate (2.5%) and its intersections with the other lines indicates the minimum sample sizes necessary to achieve this ideal (e.g. ∼150 observations with zero failures and ∼225 observations with one failure). To avoid such speculations in interpreting zero failures, a trial should be suitably designed to detect and report on at least one failure. Figure 4 shows the sample sizes required to be at least 95% (and ideally 99%) confident of observing at least one failure for different underlying failure rates. These data are calculated from the Poisson distribution, which is closely related to the binomial distribution, and which I have discussed before [10, 11]. In fact, because it can be argued that a 95% CI upper limit for failure rate > 2.5% is unacceptable, much of the x-axis is not relevant and is used only to illustrate the general shape of the relationship. Thus for a device in which the worst we can expect for failure rate should be < 2.5%, Fig. 4 tells us that we need to observe its use in ∼250 patients to be 99% confident of detecting at least one failure. This statement is consistent with the last sentence in the paragraph above, and suggests that for a potentially suitable novel airway management device, the minimum sample size for an observational trial should be ∼250. Minimum sample sizes needed to be 95% (black) and 99% (red) confident of observing at least one instance of failure in a trial of a binomial outcome, as a function of the prevailing failure rate. The vertical dotted line indicates the maximum acceptable failure rate of 2.5% and its intersection with the red line suggests that a minimum sample size of ∼250 is appropriate. “... we hope that those fortunate enough to be able to report ‘no problems so far’ will quantify the worst ... that a group of future patients can expect.” [1] I use the following examples purely to illustrate the points made above, noting that in some instances the devices in question may have been subjected to many more trials, including RCTS, and that the results of the papers discussed may have been superseded. My comments relate only to the statistical aspects of the studies and not to the performance of devices themselves, and furthermore, only to the results taken in isolation rather than in conjunction with other studies of the same or similar devices. Pujol et al. described using a single-use fibreoptic scope in 10 patients, reporting just one failure (10% point estimate for failure rate) [3]. Their (unreported) 95% CI upper limit for failure with this device is in fact 45%, meaning that the worst we can expect from this device is that it fails in almost half of our patients. Beringer et al. studied the paediatric i-gel, reporting 10 first-attempt failures in 120 patients; a point-estimate of ∼8% but an (unreported) upper 95% CI limit for failure of 15% [4]. After several attempts, there was just one failure, but even this yields an upper 95% CI of 4.6%, which is unacceptable according to our predefined criteria. This underlines the need for large trials when there is even one failure. Hodzovic et al. studied a single-use introducer in 203 patients – an apparently large sample but perhaps not large enough, as they found seven failures even after three attempts [5]. The point estimate of just 3.5% may seem attractive, but its (unreported) 95% CI upper limit is a decidedly poor 7%, which does not justify their conclusion that this device exhibits ‘high success rates’. It is a matter of clinical, rather than statistical, debate whether it is the first-time or overall success rate that should be the relevant endpoint, but if the former, then their upper 95% CI for first-attempt success is even worse: 22%. However, Aziz and Metz, reporting just one failure in 301 patients using an optical stylet, were justified in describing this as an effective tool. The upper 95% CI for failure is suitably low (1.8%) [6]. Their results seem to support the notion that > 250 patients are needed to confirm a ‘positive’ result in an observational trial. Cook et al. and Gatward et al. (from the same research group) examined use of two different SADs in two observational studies, each of 100 patients [7, 8]. The results with the (then unreported) 95% CI upper limits are shown in Table 1. Even if the overall success within three attempts is taken as the prime measure of performance, the 95% CI upper limitss exceed our suggested limit of 2.5% for both devices. However, the approach used by this group does allow for some comparison of devices and in this case, one device does seem to perform somewhat better (Table 1). It is possible to extend their method of separate observational trials appropriately to generate forms of historical control or comparison groups, for which specific statistical techniques have been described and for which forms of meta-analysis can then be undertaken [16, 17]. “[We] encourage those reporting such observations to consider the maximum risk with which their findings are compatible.” [1] Although RCTs are often regarded as a ‘gold standard’ because they eliminate bias so effectively, it is increasingly apparent that they are not always necessary for all research questions [18–23]. It is not my intention to argue that observational trials of devices are inappropriate. On the contrary, observational studies can yield baseline data, their results can be compared with performance measures set by consensus, and they can form the basis of registries that are increasingly recognised as a useful means of answering pragmatic, clinically relevant questions [22, 23]. However, observational trials must be designed in a proper manner in order to be meaningful. Concerns that observational studies exaggerate the beneficial effects of interventions relate to the imprecision in their conclusions and in turn, this often results from inadequate sample sizes. When appropriately designed observational studies are examined, their results can be remarkably similar to those of RCTs [24–26]. Formal sources now offer general guidance on the reporting of observational studies (STROBE: Strengthening the Reporting of Observational studies in Epidemiology [27] and MOOSE: Meta-analysis of Observational Studies in Epidemiology [28]). The considerations above lead to the following conclusions and recommendations. While these do not form ‘journal policy’ it is anticipated that reviewers will find them sufficiently persuasive to apply them generally to all observational studies: observational, single-group studies should incorporate the appropriate elements of STROBE or MOOSE guidelines in their reporting; investigators in observational studies should state what they regard as ‘success’ or ‘failure’ of using the device in question. Where the main endpoint is a continuous variable, researchers should define a justifiable threshold so that it can also be described as a caterogical (binomial) endpoint; for all results, the upper limit of the 95% CI should be quoted (see Appendix 1); for airway devices such as SADs, a 95% CI upper limit failure rate > 2.5% (‘the worst we can expect’) in the chosen endpoint should generally lead to a conclusion that the device is poor or unsuitable for clinical use, compared with established devices; in planning an observational trial, the minimum sample size should be calculated to yield (99% CI) for at least one failure of the device, which generally means that sample sizes should exceed 250 (3, 4); observational studies with < 250 patients are unlikely to provide useful evidence in favour of a device’s clinical acceptability, although they may provide useful information suggestive of its lack of acceptability. Researchers and manufacturers involved in observational trials may be surprised to read how even a 100% success rate in a trial of say, 200 patients, may be no proof at all that the device works. They need not, however, be concerned that journal requirements have become in any way more ‘stringent’. In using evidence from observational studies, there are considerations apart from just numbers to take into account. For example, authors’ subjective experiences are also important and, regarding airway devices, observational studies with historical controls can meet the level of evidence (3b) required by the Difficult Airway Society’s ADEPT guidance [29, 30]. Ultimately, increased collaboration between units is an important and increasingly necessary way of improving the strength of evidence from observational studies, by making it easier to achieve the required number of observations [29–31]. No external funding or competing interests declared. JJP is Scientific Officer of the Difficult Airway Society but the views expressed in this article are personal and do not reflect any official views or policy of that society. And 95% CIs are estimated as from μ−1.96.SE (lower limit) to μ +1.96.SE (upper limit) [32]. Many online calculators assist in the calculations (e.g. see http://statpages.org/confint.html)." @default.
- W2045196652 created "2016-06-24" @default.
- W2045196652 creator A5000855919 @default.
- W2045196652 date "2012-05-07" @default.
- W2045196652 modified "2023-10-17" @default.
- W2045196652 title "If it hasn’t failed, does it work? On ‘the worst we can expect’ from observational trial results, with reference to airway management devices" @default.
- W2045196652 cites W1494828475 @default.
- W2045196652 cites W1534348342 @default.
- W2045196652 cites W1560132489 @default.
- W2045196652 cites W1584570151 @default.
- W2045196652 cites W1596718365 @default.
- W2045196652 cites W1628382290 @default.
- W2045196652 cites W1817531244 @default.
- W2045196652 cites W1822873143 @default.
- W2045196652 cites W1979423827 @default.
- W2045196652 cites W19849081 @default.
- W2045196652 cites W1988155227 @default.
- W2045196652 cites W2000456697 @default.
- W2045196652 cites W2014521622 @default.
- W2045196652 cites W2020872175 @default.
- W2045196652 cites W2028567471 @default.
- W2045196652 cites W2044004985 @default.
- W2045196652 cites W2050128319 @default.
- W2045196652 cites W2058100042 @default.
- W2045196652 cites W2071261338 @default.
- W2045196652 cites W2073171347 @default.
- W2045196652 cites W2073715629 @default.
- W2045196652 cites W2103739012 @default.
- W2045196652 cites W2119186079 @default.
- W2045196652 cites W2149819031 @default.
- W2045196652 cites W2161523522 @default.
- W2045196652 cites W2319983832 @default.
- W2045196652 cites W3022423927 @default.
- W2045196652 cites W4254573229 @default.
- W2045196652 cites W4320300927 @default.
- W2045196652 doi "https://doi.org/10.1111/j.1365-2044.2012.07155.x" @default.
- W2045196652 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/22563955" @default.
- W2045196652 hasPublicationYear "2012" @default.
- W2045196652 type Work @default.
- W2045196652 sameAs 2045196652 @default.
- W2045196652 citedByCount "62" @default.
- W2045196652 countsByYear W20451966522012 @default.
- W2045196652 countsByYear W20451966522013 @default.
- W2045196652 countsByYear W20451966522014 @default.
- W2045196652 countsByYear W20451966522015 @default.
- W2045196652 countsByYear W20451966522016 @default.
- W2045196652 countsByYear W20451966522017 @default.
- W2045196652 countsByYear W20451966522018 @default.
- W2045196652 countsByYear W20451966522019 @default.
- W2045196652 countsByYear W20451966522020 @default.
- W2045196652 countsByYear W20451966522021 @default.
- W2045196652 countsByYear W20451966522022 @default.
- W2045196652 crossrefType "journal-article" @default.
- W2045196652 hasAuthorship W2045196652A5000855919 @default.
- W2045196652 hasBestOaLocation W20451966521 @default.
- W2045196652 hasConcept C105922876 @default.
- W2045196652 hasConcept C126322002 @default.
- W2045196652 hasConcept C127413603 @default.
- W2045196652 hasConcept C177713679 @default.
- W2045196652 hasConcept C18762648 @default.
- W2045196652 hasConcept C23131810 @default.
- W2045196652 hasConcept C2780978852 @default.
- W2045196652 hasConcept C42219234 @default.
- W2045196652 hasConcept C71924100 @default.
- W2045196652 hasConcept C78519656 @default.
- W2045196652 hasConceptScore W2045196652C105922876 @default.
- W2045196652 hasConceptScore W2045196652C126322002 @default.
- W2045196652 hasConceptScore W2045196652C127413603 @default.
- W2045196652 hasConceptScore W2045196652C177713679 @default.
- W2045196652 hasConceptScore W2045196652C18762648 @default.
- W2045196652 hasConceptScore W2045196652C23131810 @default.
- W2045196652 hasConceptScore W2045196652C2780978852 @default.
- W2045196652 hasConceptScore W2045196652C42219234 @default.
- W2045196652 hasConceptScore W2045196652C71924100 @default.
- W2045196652 hasConceptScore W2045196652C78519656 @default.
- W2045196652 hasIssue "6" @default.
- W2045196652 hasLocation W20451966521 @default.
- W2045196652 hasLocation W20451966522 @default.
- W2045196652 hasOpenAccess W2045196652 @default.
- W2045196652 hasPrimaryLocation W20451966521 @default.
- W2045196652 hasRelatedWork W1942234977 @default.
- W2045196652 hasRelatedWork W2043565652 @default.
- W2045196652 hasRelatedWork W2274624217 @default.
- W2045196652 hasRelatedWork W2301156551 @default.
- W2045196652 hasRelatedWork W3119715502 @default.
- W2045196652 hasRelatedWork W316293832 @default.
- W2045196652 hasRelatedWork W4226217760 @default.
- W2045196652 hasRelatedWork W4318965102 @default.
- W2045196652 hasRelatedWork W4320924787 @default.
- W2045196652 hasRelatedWork W54350062 @default.
- W2045196652 hasVolume "67" @default.
- W2045196652 isParatext "false" @default.
- W2045196652 isRetracted "false" @default.
- W2045196652 magId "2045196652" @default.
- W2045196652 workType "article" @default.