Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387294080> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W4387294080 abstract "Large Language Models (LLMs) have recently been shown to be effective as automatic evaluators with simple prompting and in-context learning. In this work, we assemble 15 LLMs of four different size ranges and evaluate their output responses by preference ranking from the other LLMs as evaluators, such as System Star is better than System Square. We then evaluate the quality of ranking outputs introducing the Cognitive Bias Benchmark for LLMs as Evaluators (CoBBLEr), a benchmark to measure six different cognitive biases in LLM evaluation outputs, such as the Egocentric bias where a model prefers to rank its own outputs highly in evaluation. We find that LLMs are biased text quality evaluators, exhibiting strong indications on our bias benchmark (average of 40% of comparisons across all models) within each of their evaluations that question their robustness as evaluators. Furthermore, we examine the correlation between human and machine preferences and calculate the average Rank-Biased Overlap (RBO) score to be 49.6%, indicating that machine preferences are misaligned with humans. According to our findings, LLMs may still be unable to be utilized for automatic annotation aligned with human preferences. Our project page is at: https://minnesotanlp.github.io/cobbler." @default.
- W4387294080 created "2023-10-03" @default.
- W4387294080 creator A5040821714 @default.
- W4387294080 creator A5042157425 @default.
- W4387294080 creator A5051484319 @default.
- W4387294080 creator A5058564158 @default.
- W4387294080 creator A5066804511 @default.
- W4387294080 creator A5091017131 @default.
- W4387294080 date "2023-09-29" @default.
- W4387294080 modified "2023-10-04" @default.
- W4387294080 title "Benchmarking Cognitive Biases in Large Language Models as Evaluators" @default.
- W4387294080 doi "https://doi.org/10.48550/arxiv.2309.17012" @default.
- W4387294080 hasPublicationYear "2023" @default.
- W4387294080 type Work @default.
- W4387294080 citedByCount "0" @default.
- W4387294080 crossrefType "posted-content" @default.
- W4387294080 hasAuthorship W4387294080A5040821714 @default.
- W4387294080 hasAuthorship W4387294080A5042157425 @default.
- W4387294080 hasAuthorship W4387294080A5051484319 @default.
- W4387294080 hasAuthorship W4387294080A5058564158 @default.
- W4387294080 hasAuthorship W4387294080A5066804511 @default.
- W4387294080 hasAuthorship W4387294080A5091017131 @default.
- W4387294080 hasBestOaLocation W43872940801 @default.
- W4387294080 hasConcept C104317684 @default.
- W4387294080 hasConcept C111472728 @default.
- W4387294080 hasConcept C114614502 @default.
- W4387294080 hasConcept C119857082 @default.
- W4387294080 hasConcept C13280743 @default.
- W4387294080 hasConcept C138885662 @default.
- W4387294080 hasConcept C144133560 @default.
- W4387294080 hasConcept C149782125 @default.
- W4387294080 hasConcept C151730666 @default.
- W4387294080 hasConcept C154945302 @default.
- W4387294080 hasConcept C15744967 @default.
- W4387294080 hasConcept C162324750 @default.
- W4387294080 hasConcept C162853370 @default.
- W4387294080 hasConcept C164226766 @default.
- W4387294080 hasConcept C169760540 @default.
- W4387294080 hasConcept C169900460 @default.
- W4387294080 hasConcept C180747234 @default.
- W4387294080 hasConcept C185592680 @default.
- W4387294080 hasConcept C185798385 @default.
- W4387294080 hasConcept C189216375 @default.
- W4387294080 hasConcept C189430467 @default.
- W4387294080 hasConcept C205649164 @default.
- W4387294080 hasConcept C2779343474 @default.
- W4387294080 hasConcept C2779530757 @default.
- W4387294080 hasConcept C33923547 @default.
- W4387294080 hasConcept C41008148 @default.
- W4387294080 hasConcept C55493867 @default.
- W4387294080 hasConcept C63479239 @default.
- W4387294080 hasConcept C86251818 @default.
- W4387294080 hasConcept C86803240 @default.
- W4387294080 hasConceptScore W4387294080C104317684 @default.
- W4387294080 hasConceptScore W4387294080C111472728 @default.
- W4387294080 hasConceptScore W4387294080C114614502 @default.
- W4387294080 hasConceptScore W4387294080C119857082 @default.
- W4387294080 hasConceptScore W4387294080C13280743 @default.
- W4387294080 hasConceptScore W4387294080C138885662 @default.
- W4387294080 hasConceptScore W4387294080C144133560 @default.
- W4387294080 hasConceptScore W4387294080C149782125 @default.
- W4387294080 hasConceptScore W4387294080C151730666 @default.
- W4387294080 hasConceptScore W4387294080C154945302 @default.
- W4387294080 hasConceptScore W4387294080C15744967 @default.
- W4387294080 hasConceptScore W4387294080C162324750 @default.
- W4387294080 hasConceptScore W4387294080C162853370 @default.
- W4387294080 hasConceptScore W4387294080C164226766 @default.
- W4387294080 hasConceptScore W4387294080C169760540 @default.
- W4387294080 hasConceptScore W4387294080C169900460 @default.
- W4387294080 hasConceptScore W4387294080C180747234 @default.
- W4387294080 hasConceptScore W4387294080C185592680 @default.
- W4387294080 hasConceptScore W4387294080C185798385 @default.
- W4387294080 hasConceptScore W4387294080C189216375 @default.
- W4387294080 hasConceptScore W4387294080C189430467 @default.
- W4387294080 hasConceptScore W4387294080C205649164 @default.
- W4387294080 hasConceptScore W4387294080C2779343474 @default.
- W4387294080 hasConceptScore W4387294080C2779530757 @default.
- W4387294080 hasConceptScore W4387294080C33923547 @default.
- W4387294080 hasConceptScore W4387294080C41008148 @default.
- W4387294080 hasConceptScore W4387294080C55493867 @default.
- W4387294080 hasConceptScore W4387294080C63479239 @default.
- W4387294080 hasConceptScore W4387294080C86251818 @default.
- W4387294080 hasConceptScore W4387294080C86803240 @default.
- W4387294080 hasLocation W43872940801 @default.
- W4387294080 hasOpenAccess W4387294080 @default.
- W4387294080 hasPrimaryLocation W43872940801 @default.
- W4387294080 hasRelatedWork W2018654704 @default.
- W4387294080 hasRelatedWork W2593649365 @default.
- W4387294080 hasRelatedWork W2950577464 @default.
- W4387294080 hasRelatedWork W3170111948 @default.
- W4387294080 hasRelatedWork W4292730586 @default.
- W4387294080 hasRelatedWork W4302612983 @default.
- W4387294080 hasRelatedWork W4312052138 @default.
- W4387294080 hasRelatedWork W4381245711 @default.
- W4387294080 hasRelatedWork W4383605156 @default.
- W4387294080 hasRelatedWork W4287077734 @default.
- W4387294080 isParatext "false" @default.
- W4387294080 isRetracted "false" @default.
- W4387294080 workType "article" @default.