Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387560265> ?p ?o ?g. }
Showing items 1 to 55 of
55
with 100 items per page.
- W4387560265 abstract "Using large language models (LLMs) to evaluate text quality has recently gained popularity. Some prior works explore the idea of using LLMs for evaluation, while they differ in some details of the evaluation process. In this paper, we analyze LLM evaluation (Chiang and Lee, 2023) and G-Eval (Liu et al., 2023), and we discuss how those details in the evaluation process change how well the ratings given by LLMs correlate with human ratings. We find that the auto Chain-of-Thought (CoT) used in G-Eval does not always make G-Eval more aligned with human ratings. We also show that forcing the LLM to output only a numeric rating, as in G-Eval, is suboptimal. Last, we reveal that asking the LLM to explain its own ratings consistently improves the correlation between the ChatGPT and human ratings and pushes state-of-the-art (SoTA) correlations on two meta-evaluation datasets." @default.
- W4387560265 created "2023-10-12" @default.
- W4387560265 creator A5040508737 @default.
- W4387560265 creator A5071482462 @default.
- W4387560265 date "2023-10-09" @default.
- W4387560265 modified "2023-10-18" @default.
- W4387560265 title "A Closer Look into Automatic Evaluation Using Large Language Models" @default.
- W4387560265 doi "https://doi.org/10.48550/arxiv.2310.05657" @default.
- W4387560265 hasPublicationYear "2023" @default.
- W4387560265 type Work @default.
- W4387560265 citedByCount "0" @default.
- W4387560265 crossrefType "posted-content" @default.
- W4387560265 hasAuthorship W4387560265A5040508737 @default.
- W4387560265 hasAuthorship W4387560265A5071482462 @default.
- W4387560265 hasBestOaLocation W43875602651 @default.
- W4387560265 hasConcept C111472728 @default.
- W4387560265 hasConcept C134306372 @default.
- W4387560265 hasConcept C138885662 @default.
- W4387560265 hasConcept C154945302 @default.
- W4387560265 hasConcept C15744967 @default.
- W4387560265 hasConcept C197115733 @default.
- W4387560265 hasConcept C204321447 @default.
- W4387560265 hasConcept C2779530757 @default.
- W4387560265 hasConcept C2780586970 @default.
- W4387560265 hasConcept C33923547 @default.
- W4387560265 hasConcept C41008148 @default.
- W4387560265 hasConcept C77805123 @default.
- W4387560265 hasConceptScore W4387560265C111472728 @default.
- W4387560265 hasConceptScore W4387560265C134306372 @default.
- W4387560265 hasConceptScore W4387560265C138885662 @default.
- W4387560265 hasConceptScore W4387560265C154945302 @default.
- W4387560265 hasConceptScore W4387560265C15744967 @default.
- W4387560265 hasConceptScore W4387560265C197115733 @default.
- W4387560265 hasConceptScore W4387560265C204321447 @default.
- W4387560265 hasConceptScore W4387560265C2779530757 @default.
- W4387560265 hasConceptScore W4387560265C2780586970 @default.
- W4387560265 hasConceptScore W4387560265C33923547 @default.
- W4387560265 hasConceptScore W4387560265C41008148 @default.
- W4387560265 hasConceptScore W4387560265C77805123 @default.
- W4387560265 hasLocation W43875602651 @default.
- W4387560265 hasOpenAccess W4387560265 @default.
- W4387560265 hasPrimaryLocation W43875602651 @default.
- W4387560265 hasRelatedWork W2142306706 @default.
- W4387560265 hasRelatedWork W2348524959 @default.
- W4387560265 hasRelatedWork W2368049389 @default.
- W4387560265 hasRelatedWork W2368605798 @default.
- W4387560265 hasRelatedWork W2384861574 @default.
- W4387560265 hasRelatedWork W2518037665 @default.
- W4387560265 hasRelatedWork W2952704802 @default.
- W4387560265 hasRelatedWork W308652608 @default.
- W4387560265 hasRelatedWork W3192589309 @default.
- W4387560265 hasRelatedWork W4294565801 @default.
- W4387560265 isParatext "false" @default.
- W4387560265 isRetracted "false" @default.
- W4387560265 workType "article" @default.