Matches in SemOpenAlex for { <https://semopenalex.org/work/W4378469989> ?p ?o ?g. }
Showing items 1 to 63 of
63
with 100 items per page.
- W4378469989 abstract "With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency detection compared to traditional non-LLM methods. However, a closer analysis reveals that most LLMs fail on more complex formulations of the task and exposes issues with existing evaluation benchmarks, affecting evaluation precision. To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits. This new benchmark is 20 times more cost-effective per sample than previous benchmarks and highly reproducible, as we estimate inter-annotator agreement at about 0.9. Most LLMs struggle on SummEdits, with performance close to random chance. The best-performing model, GPT-4, is still 8% below estimated human performance, highlighting the gaps in LLMs' ability to reason about facts and detect inconsistencies when they occur." @default.
- W4378469989 created "2023-05-27" @default.
- W4378469989 creator A5004155361 @default.
- W4378469989 creator A5005443526 @default.
- W4378469989 creator A5023648764 @default.
- W4378469989 creator A5032046813 @default.
- W4378469989 creator A5050818189 @default.
- W4378469989 creator A5066791810 @default.
- W4378469989 creator A5070892989 @default.
- W4378469989 date "2023-05-23" @default.
- W4378469989 modified "2023-09-27" @default.
- W4378469989 title "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond" @default.
- W4378469989 doi "https://doi.org/10.48550/arxiv.2305.14540" @default.
- W4378469989 hasPublicationYear "2023" @default.
- W4378469989 type Work @default.
- W4378469989 citedByCount "0" @default.
- W4378469989 crossrefType "posted-content" @default.
- W4378469989 hasAuthorship W4378469989A5004155361 @default.
- W4378469989 hasAuthorship W4378469989A5005443526 @default.
- W4378469989 hasAuthorship W4378469989A5023648764 @default.
- W4378469989 hasAuthorship W4378469989A5032046813 @default.
- W4378469989 hasAuthorship W4378469989A5050818189 @default.
- W4378469989 hasAuthorship W4378469989A5066791810 @default.
- W4378469989 hasAuthorship W4378469989A5070892989 @default.
- W4378469989 hasBestOaLocation W43784699891 @default.
- W4378469989 hasConcept C112930515 @default.
- W4378469989 hasConcept C13280743 @default.
- W4378469989 hasConcept C144133560 @default.
- W4378469989 hasConcept C154945302 @default.
- W4378469989 hasConcept C162324750 @default.
- W4378469989 hasConcept C185798385 @default.
- W4378469989 hasConcept C187736073 @default.
- W4378469989 hasConcept C205649164 @default.
- W4378469989 hasConcept C2776436953 @default.
- W4378469989 hasConcept C2780451532 @default.
- W4378469989 hasConcept C41008148 @default.
- W4378469989 hasConceptScore W4378469989C112930515 @default.
- W4378469989 hasConceptScore W4378469989C13280743 @default.
- W4378469989 hasConceptScore W4378469989C144133560 @default.
- W4378469989 hasConceptScore W4378469989C154945302 @default.
- W4378469989 hasConceptScore W4378469989C162324750 @default.
- W4378469989 hasConceptScore W4378469989C185798385 @default.
- W4378469989 hasConceptScore W4378469989C187736073 @default.
- W4378469989 hasConceptScore W4378469989C205649164 @default.
- W4378469989 hasConceptScore W4378469989C2776436953 @default.
- W4378469989 hasConceptScore W4378469989C2780451532 @default.
- W4378469989 hasConceptScore W4378469989C41008148 @default.
- W4378469989 hasLocation W43784699891 @default.
- W4378469989 hasOpenAccess W4378469989 @default.
- W4378469989 hasPrimaryLocation W43784699891 @default.
- W4378469989 hasRelatedWork W112744582 @default.
- W4378469989 hasRelatedWork W1485630101 @default.
- W4378469989 hasRelatedWork W1490303524 @default.
- W4378469989 hasRelatedWork W2030059621 @default.
- W4378469989 hasRelatedWork W2070338563 @default.
- W4378469989 hasRelatedWork W2081647779 @default.
- W4378469989 hasRelatedWork W2350879319 @default.
- W4378469989 hasRelatedWork W2353865532 @default.
- W4378469989 hasRelatedWork W2498017833 @default.
- W4378469989 hasRelatedWork W3081841992 @default.
- W4378469989 isParatext "false" @default.
- W4378469989 isRetracted "false" @default.
- W4378469989 workType "article" @default.