Matches in SemOpenAlex for { <https://semopenalex.org/work/W3201860539> ?p ?o ?g. }
- W3201860539 endingPage "136216" @default.
- W3201860539 startingPage "136182" @default.
- W3201860539 abstract "Information Retrieval (IR) is a discipline deeply rooted in evaluation since its inception. Indeed, experimentally measuring and statistically validating the performance of IR systems are the only possible ways to compare systems and understand which are better than others and, ultimately, more effective and useful for end-users. Since the seminal paper by Stevens [103], it is known that the properties of the measurement scales determine the operations you should or should not perform with values from those scales. For example, Stevens suggested that you can compute means and variances only when you are working with, at least, interval scales. It was recently shown that the most popular evaluation measures in IR are not interval-scaled. However, so far, there has been little or no investigation in IR on the impact and consequences of departing from scale assumptions. Taken to the extremes, it might even mean that decades of experimental IR research used potentially improper methods, which may have produced results needing further validation. However, it was unclear if and to what extent these findings apply to actual evaluations; this opened a debate in the community with researchers standing on opposite positions about whether this should be considered an issue (or not) and to what extent. In this paper, we first give an introduction to the representational measurement theory explaining why certain operations and significance tests are permissible only with scales of a certain level. For that, we introduce the notion of meaningfulness specifying the conditions under which the truth (or falsity) of a statement is invariant under permissible transformations of a scale. Furthermore, we show how the recall base and the length of the run may make comparison and aggregation across topics problematic. Then we propose a straightforward and powerful approach for turning an evaluation measure into an interval scale, and describe an experimental evaluation of the differences between the original measures and the interval-scaled ones. For all the regarded measures – namely Precision, Recall, Average Precision, (Normalized) Discounted Cumulative Gain, Rank-Biased Precision and Reciprocal Rank - we observe substantial effects, both on the order of average values and on the outcome of significance tests. For the latter, previously significant differences turn out to be insignificant, while insignificant ones become significant. The effect varies remarkably between the tests considered but on average, we observed a 25% change in the decision about which systems are significantly different and which are not. These experimental findings further support the idea that measurement scales matter and that departing from their assumptions has an impact. This not only suggests that, to the extent possible, it would be better to comply with such assumptions but it also urges us to clearly indicate when we depart from such assumptions and, carefully, point out the limitations of the conclusions we draw and under which conditions they are drawn." @default.
- W3201860539 created "2021-10-11" @default.
- W3201860539 creator A5036327985 @default.
- W3201860539 creator A5054035663 @default.
- W3201860539 creator A5069843101 @default.
- W3201860539 date "2021-01-01" @default.
- W3201860539 modified "2023-10-16" @default.
- W3201860539 title "Towards Meaningful Statements in IR Evaluation: Mapping Evaluation Measures to Interval Scales" @default.
- W3201860539 cites W1579063438 @default.
- W3201860539 cites W1830072110 @default.
- W3201860539 cites W1964405087 @default.
- W3201860539 cites W1965690069 @default.
- W3201860539 cites W1966835268 @default.
- W3201860539 cites W1966893739 @default.
- W3201860539 cites W1967879792 @default.
- W3201860539 cites W1968927634 @default.
- W3201860539 cites W1973294774 @default.
- W3201860539 cites W1974758710 @default.
- W3201860539 cites W1979346010 @default.
- W3201860539 cites W1988619666 @default.
- W3201860539 cites W1989220366 @default.
- W3201860539 cites W1999352670 @default.
- W3201860539 cites W2008525336 @default.
- W3201860539 cites W2015629112 @default.
- W3201860539 cites W2017292914 @default.
- W3201860539 cites W2021682665 @default.
- W3201860539 cites W2021856948 @default.
- W3201860539 cites W2022995284 @default.
- W3201860539 cites W2034173707 @default.
- W3201860539 cites W2035569891 @default.
- W3201860539 cites W2051442094 @default.
- W3201860539 cites W2052569738 @default.
- W3201860539 cites W2053100920 @default.
- W3201860539 cites W2055011662 @default.
- W3201860539 cites W2057069670 @default.
- W3201860539 cites W2057495142 @default.
- W3201860539 cites W2057720927 @default.
- W3201860539 cites W2058413358 @default.
- W3201860539 cites W2058896506 @default.
- W3201860539 cites W2059120814 @default.
- W3201860539 cites W2061190106 @default.
- W3201860539 cites W2063853488 @default.
- W3201860539 cites W2065702061 @default.
- W3201860539 cites W2069870183 @default.
- W3201860539 cites W2072240081 @default.
- W3201860539 cites W2074466695 @default.
- W3201860539 cites W2076227143 @default.
- W3201860539 cites W2077046902 @default.
- W3201860539 cites W2087496379 @default.
- W3201860539 cites W2091425927 @default.
- W3201860539 cites W2091560105 @default.
- W3201860539 cites W2092282607 @default.
- W3201860539 cites W2093397547 @default.
- W3201860539 cites W2101713682 @default.
- W3201860539 cites W2109244020 @default.
- W3201860539 cites W2112014123 @default.
- W3201860539 cites W2113640060 @default.
- W3201860539 cites W2120308175 @default.
- W3201860539 cites W2130076000 @default.
- W3201860539 cites W2131358477 @default.
- W3201860539 cites W2136173837 @default.
- W3201860539 cites W2137274315 @default.
- W3201860539 cites W2156296249 @default.
- W3201860539 cites W2160892561 @default.
- W3201860539 cites W2231711798 @default.
- W3201860539 cites W2318802957 @default.
- W3201860539 cites W2324619102 @default.
- W3201860539 cites W2325957229 @default.
- W3201860539 cites W2336806308 @default.
- W3201860539 cites W2500119194 @default.
- W3201860539 cites W2515650098 @default.
- W3201860539 cites W2740517242 @default.
- W3201860539 cites W2752377997 @default.
- W3201860539 cites W2756726346 @default.
- W3201860539 cites W2794432940 @default.
- W3201860539 cites W2804290038 @default.
- W3201860539 cites W2895814542 @default.
- W3201860539 cites W2933965843 @default.
- W3201860539 cites W2956058978 @default.
- W3201860539 cites W2971801677 @default.
- W3201860539 cites W2974133468 @default.
- W3201860539 cites W3010282573 @default.
- W3201860539 cites W3028964516 @default.
- W3201860539 cites W3034645139 @default.
- W3201860539 cites W3132995484 @default.
- W3201860539 cites W4211226115 @default.
- W3201860539 cites W4213251304 @default.
- W3201860539 cites W4233051669 @default.
- W3201860539 cites W4233371700 @default.
- W3201860539 cites W4234071252 @default.
- W3201860539 cites W4234620507 @default.
- W3201860539 cites W4245385379 @default.
- W3201860539 cites W4247128285 @default.
- W3201860539 cites W4252684946 @default.
- W3201860539 cites W4253799536 @default.
- W3201860539 cites W4256161694 @default.
- W3201860539 cites W4256250826 @default.
- W3201860539 cites W4300410021 @default.