Matches in SemOpenAlex for { <https://semopenalex.org/work/W2995784628> ?p ?o ?g. }
Showing items 1 to 48 of
48
with 100 items per page.
- W2995784628 endingPage "215" @default.
- W2995784628 startingPage "211" @default.
- W2995784628 abstract "Statistical errors are surprisingly common in the biomedical literature and contribute to the reproducibility crisis.1 Some errors are impossible to spot without access to the underlying dataset, but many are detectable from information available in the paper. You don't need a degree in statistics to catch most of these errors. Common sense and simple arithmetic are often all that's required. In addition, there are an increasing number of free, easy-to-use online statistical sleuthing tools that facilitate error detection. This article reviews some simple techniques and tools you can use to catch errors when reviewing others' papers or when double-checking your own work. In some cases, reported statistics and results don't make sense at face value. Thus, one of the best tools in the statistical detective's toolkit is common sense. For example, I was once reviewing a meta-analysis of a nonsurgical treatment for knee pain that reported a pooled effect size of a 4.04-standard deviation (SD) reduction in pain in treated patients (95% confidence interval: 2.81 to 5.26). This is an implausibly large effect size. Note that a 0.8 SD effect size is commonly used as a benchmark for a “large” effect. Figure 1 illustrates what the pre- and posttreatment distributions of pain scores would have to look like to achieve a 4.04-SD reduction in pain. Most patients would have to have very high pain scores before treatment and very low pain scores after treatment. (Alternatively, the pretreatment variability in pain scores would have to be implausibly low.) It's unlikely that any treatment for knee pain can achieve such a consistent and dramatic reduction in pain. A little further statistical detective work revealed the error in this case: When performing the meta-analysis, the authors had erroneously plugged in standard errors rather than SDs from the original papers. https://www.graphpad.com/quickcalcs/ttest1/?Format=SD https://stattrek.com/online-calculator/f-distribution.aspx Many important statistical errors can be revealed through simple arithmetic. When numbers within or across tables and figures don't add up, this may reveal larger problems with data management or analysis. For example, Figure 2 shows a table from a published paper that contains numerous statistical and numerical inconsistencies.2 This paper was brought to my attention because the second author has had 17 other papers retracted to date.3 The paper explored the effects of labeling physical activity as “fun” vs “exercise” on participants' subsequent food consumption. A quick scan of the table (Figure 2) reveals simple numerical problems that should have been caught during peer review. For example, though each group had 28 participants, a sample size of n = 29 is listed for the exercise framing group in the “Drink chosen” column. The explanation reveals an error: n = 29 refers to the number of drinks rather than the number of people; and the authors calculated the mean calories per drink rather than the quantity of interest: the mean drink calories per person. (The calories per drink from four cokes of 240 calories each and 25 waters yields the exact mean and SD listed in the table: 33.10 and 88.22, respectively.) Many other quantities do not add up. For example, the total meal is composed of the main meal plus drinks and dessert. Thus, the average “Total meal” calories (column 1) should equal the sum of the mean “Regular meal” calories (column 2) plus the mean “Hedonic” (drink and dessert) calories (column 3), if there are no missing data. For the exercise framing group, no missing data are reported (n = 28 in all three columns), but 193.67 + 133.98 = 327.65, which does not equal the reported mean total calories of 313.23. Similarly, the average hedonic calories (column 3) should be calculable from the dessert (columns 4 and 5) and drinks (column 6). But (33.10*29 + 75.14*16 + 135.30*11)/28 = 130.37, which does not equal 133.98. These inconsistencies could be due to simple transcribing errors, or they may belie larger problems with the dataset and analysis. In this case, there is reason to suspect the latter, given that numerous other papers by the same author have been retracted due to problems with data integrity and the misuse of statistics. Recent years have seen the development of easy-to-use, online web applications designed to detect statistical inconsistencies in papers, such as statcheck and GRIM. One can also check statistics for internal consistency using online calculators that perform statistical tests from summary statistics. Note that identifying an inconsistency does not necessarily indicate a statistical error or manipulation; further inspection is needed to determine the source of the inconsistency. Statcheck is a statistical package in R that was developed by Michèle Nuijten and Sacha Epskam4 and is also available as an easy-to-use web application: http://statcheck.io/.5 The program automatically extracts statistics from papers and checks them for internal consistency. For example, in the fun vs exercise framing paper,2 the mean hedonic calories reported was 133.98 in the exercise framing group vs 94.45 in the fun framing group (Figure 2). The authors report that this is a significant difference, and in the text they give the relevant test statistic, degrees of freedom, and P value as F(1, 50) = 2.791, P < .05. Statcheck automatically extracts these numbers from the text and then checks whether an F statistic of 2.791 with numerator degrees of freedom of 1 and denominator degrees of freedom of 50 actually yields a P value less than .05. Statcheck users can upload the paper as a pdf, html, or docx file, and statcheck promptly returns a spreadsheet with all the extracted statistics. When I uploaded the fun vs exercise paper, statcheck extracted 11 test statistics, 3 of which it flagged as inconsistent, including the F(1,50) = 2.791, P < .05 statistic. There is an explanation for this inconsistency, but the explanation reveals a statistical sleight of hand. The correct two-sided P value here is P = .101. But the authors reported a one-sided P value rather than a two-sided P value (and also rounded P = .05 down to P < .05). They did indicate “one-sided P value” in the text, but they gave no justification for this choice. Further inspection reveals that they selectively reported one-tailed P values whenever the two-tailed P value was >.05 but the one-tailed P value was ≤.05. This is misleading to readers. Statcheck has some limitations. It can only check statistics that are reported in the text (not tables) and in which the test statistic, degrees of freedom, and P value are all available. This reporting format is common in the psychology literature but less common in other domains of biomedicine. If statcheck is not appropriate for a given paper, one may also manually extract summary statistics from a paper and check them for consistency using online calculators. For example, entering an F statistic of 2.791 and degrees of freedom of 1 (numerator) and 50 (denominator), into this online F distribution calculator: https://stattrek.com/online-calculator/f-distribution.aspx,6 returns a P value of P = .101. Similarly, one can enter the means, SDs, and sample sizes of two groups into this t-test calculator: https://www.graphpad.com/quickcalcs/ttest1/?Format=SD,7 and the program will calculate the corresponding P value. When I enter a mean of 133.98, SD of 111, and sample size of 28 for the exercise framing group, and a mean of 94.45, SD of 35.48, and sample size of 24 for the fun framing group (statistics available in the table), the calculator returns a P value of P = .101. The GRIM (Granularity-Related Inconsistent Means) test, which was developed by James Heathers and Nicholas Brown, flags impossible mean values.8 For example, if pain is rated on a 0-10 integer scale, and the study has 10 participants, then the mean must contain an even tenth—for example, 4.00, 5.10, or 6.30—because the mean is a whole number divided by 10. Thus, if the paper reports a mean of 5.33, something is amiss. GRIM is implemented in an online calculator: http://www.prepubmed.org/grim_test/ 9; one simply enters the reported mean and sample size, and GRIM returns consistent or inconsistent. For the fun vs exercise framing paper, the mean reported hedonic calories in the exercise framing group is 133.98. When I plug 133.98 and n = 28 into the GRIM calculator, the mean is flagged as inconsistent. Note that this mean is inconsistent only if calories were recorded as whole numbers in the dataset; if each individual's hedonic calories were reported to one or more decimal places, then 133.98 is a possible value. Participants in this study also rated their excitement for the activity on a 9-point Likert scale (1 = not exciting, 9 = extremely exciting). The reported mean for the exercise framing group is 7.19 and for the fun framing group is 6.45. If I plug means of 7.19 or 6.45 into the GRIM calculator with sample sizes of n = 28, the means are flagged as inconsistent. I can show that only values of 7.18, 7.21, 6.43, and 6.46 are possible means for sample sizes of 28. Missing data points could potentially explain the discrepancy, as missing data would change the sample size, but the reported degrees of freedom for this comparison indicate that there were no missing data points. The GRIM test was instrumental in uncovering a slew of numerical inconsistencies in a number of papers by the same author; these papers were later retracted.10 Authors rarely provide access to their raw data, but it may be possible to access some raw data from plots or images. The free online tool WebPlotDigitizer, https://apps.automeris.io/wpd/,11 extracts precise values from plots, thus creating a downloadable dataset that permits reanalysis. The tool can retrieve X and Y values from scatter plots, means and confidence intervals from forest plots, percentages from a histograms or bar chart, and distance and angle values from pictures. WebPlotDigitizer is fast and easy to use: Upload the image into the application, and then either manually select the points of interest or use an automatic filter to extract them. (I have found that a combination of automatic plus manual is optimal.) I frequently use WebPlotDigitizer when reviewing papers to gauge the robustness of a correlation or regression result, examine the distribution of a variable, or double-check a reported statistic. For example, Figure 3 shows my reanalysis of a reported correlation that I suspected was not robust. The authors reported a significant correlation of r = .58 (P = .001) between plasma tenofovir and albumin concentrations in HIV-infected women12; however, based on their published scatter plot, I suspected that this association was driven by a single outlier (Figure 3). I used WebPlotDigitizer to extract the underlying data from the scatter plot. After reproducing the original graph and statistic, I then: (1) calculated a Spearman correlation coefficient, which is less influenced by outliers (Figure 3B), and (2) replotted and reanalyzed the data after removing the outlier (Figure 3). Without the outlier, there is no notable association between the two variables, suggesting that the association is spurious or that albumin is increased only with unusually high levels of tenofovir. Computer simulation is a powerful tool for checking statistics and statistical claims. In a simulation, the analyst generates data with known properties in order to answer questions about the data or about the behavior of a statistic that might be applied to that data. For example, I used computer simulation to generate the data pictured in Figure 1 of this article. I generated pain scores from different distributions to determine what distributions for pre- and posttreatment scores could yield a mean change of 4.04 standard deviations. In another example, I used computer simulation to demonstrate that a statistical method that has been widely applied in sports science and medicine yields unacceptably high Type I error rates.13 Running computer simulations does require statistical programming skills (such as in R or SAS). However, many simulations are easy to learn and implement with a little statistical programming knowhow. For a lengthier discussion of computer simulation see Sainani.14 To help reduce the number of statistical and numerical errors that plague the biomedical literature, researchers who write papers and perform peer review need to act as statistical detectives. Many errors are surprisingly easy to detect with nothing more than common sense, simple arithmetic, and a few free online tools reviewed in this article. Computer simulation is also a powerful error detection tool, though it does require some programming skills. Error detection would be much easier if more authors made their datasets freely available." @default.
- W2995784628 created "2019-12-26" @default.
- W2995784628 creator A5084557946 @default.
- W2995784628 date "2020-01-18" @default.
- W2995784628 modified "2023-09-27" @default.
- W2995784628 title "How to Be a Statistical Detective" @default.
- W2995784628 cites W1910875068 @default.
- W2995784628 cites W2086053590 @default.
- W2995784628 cites W2219946537 @default.
- W2995784628 cites W2403850581 @default.
- W2995784628 cites W2489655347 @default.
- W2995784628 cites W2790042388 @default.
- W2995784628 cites W2799354595 @default.
- W2995784628 doi "https://doi.org/10.1002/pmrj.12305" @default.
- W2995784628 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/31850680" @default.
- W2995784628 hasPublicationYear "2020" @default.
- W2995784628 type Work @default.
- W2995784628 sameAs 2995784628 @default.
- W2995784628 citedByCount "1" @default.
- W2995784628 countsByYear W29957846282023 @default.
- W2995784628 crossrefType "journal-article" @default.
- W2995784628 hasAuthorship W2995784628A5084557946 @default.
- W2995784628 hasBestOaLocation W29957846281 @default.
- W2995784628 hasConcept C71924100 @default.
- W2995784628 hasConcept C99508421 @default.
- W2995784628 hasConceptScore W2995784628C71924100 @default.
- W2995784628 hasConceptScore W2995784628C99508421 @default.
- W2995784628 hasIssue "2" @default.
- W2995784628 hasLocation W29957846281 @default.
- W2995784628 hasOpenAccess W2995784628 @default.
- W2995784628 hasPrimaryLocation W29957846281 @default.
- W2995784628 hasRelatedWork W1506200166 @default.
- W2995784628 hasRelatedWork W1995515455 @default.
- W2995784628 hasRelatedWork W2048182022 @default.
- W2995784628 hasRelatedWork W2080531066 @default.
- W2995784628 hasRelatedWork W2604872355 @default.
- W2995784628 hasRelatedWork W2748952813 @default.
- W2995784628 hasRelatedWork W2899084033 @default.
- W2995784628 hasRelatedWork W3031052312 @default.
- W2995784628 hasRelatedWork W3032375762 @default.
- W2995784628 hasRelatedWork W3108674512 @default.
- W2995784628 hasVolume "12" @default.
- W2995784628 isParatext "false" @default.
- W2995784628 isRetracted "false" @default.
- W2995784628 magId "2995784628" @default.
- W2995784628 workType "article" @default.