SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4380136141> ?p ?o ?g. }

Showing items 1 to 61 of 61 with 100 items per page.

W4380136141 abstract "Instruction-tuned large language models have revolutionized natural language processing and have shown great potential in applications such as conversational agents. These models, such as GPT-4, can not only master language but also solve complex tasks in areas like mathematics, coding, medicine, and law. Despite their impressive capabilities, there is still a lack of comprehensive understanding regarding their full potential, primarily due to the black-box nature of many models and the absence of holistic evaluation studies. To address these challenges, we present INSTRUCTEVAL, a more comprehensive evaluation suite designed specifically for instruction-tuned large language models. Unlike previous works, our evaluation involves a rigorous assessment of models based on problem-solving, writing ability, and alignment to human values. We take a holistic approach to analyze various factors affecting model performance, including the pretraining foundation, instruction-tuning data, and training methods. Our findings reveal that the quality of instruction data is the most crucial factor in scaling model performance. While open-source models demonstrate impressive writing abilities, there is substantial room for improvement in problem-solving and alignment. We are encouraged by the rapid development of models by the open-source community, but we also highlight the need for rigorous evaluation to support claims made about these models. Through INSTRUCTEVAL, we aim to foster a deeper understanding of instruction-tuned models and advancements in their capabilities. INSTRUCTEVAL is publicly available at https://github.com/declare-lab/instruct-eval." @default.
W4380136141 created "2023-06-10" @default.
W4380136141 creator A5032229704 @default.
W4380136141 creator A5033376109 @default.
W4380136141 creator A5086674741 @default.
W4380136141 creator A5087486426 @default.
W4380136141 date "2023-06-07" @default.
W4380136141 modified "2023-09-27" @default.
W4380136141 title "INSTRUCTEVAL: Towards Holistic Evaluation of Instruction-Tuned Large Language Models" @default.
W4380136141 doi "https://doi.org/10.48550/arxiv.2306.04757" @default.
W4380136141 hasPublicationYear "2023" @default.
W4380136141 type Work @default.
W4380136141 citedByCount "0" @default.
W4380136141 crossrefType "posted-content" @default.
W4380136141 hasAuthorship W4380136141A5032229704 @default.
W4380136141 hasAuthorship W4380136141A5033376109 @default.
W4380136141 hasAuthorship W4380136141A5086674741 @default.
W4380136141 hasAuthorship W4380136141A5087486426 @default.
W4380136141 hasBestOaLocation W43801361411 @default.
W4380136141 hasConcept C111472728 @default.
W4380136141 hasConcept C137293760 @default.
W4380136141 hasConcept C138885662 @default.
W4380136141 hasConcept C154945302 @default.
W4380136141 hasConcept C166957645 @default.
W4380136141 hasConcept C199360897 @default.
W4380136141 hasConcept C2522767166 @default.
W4380136141 hasConcept C2777904410 @default.
W4380136141 hasConcept C2779530757 @default.
W4380136141 hasConcept C3018397939 @default.
W4380136141 hasConcept C41008148 @default.
W4380136141 hasConcept C79581498 @default.
W4380136141 hasConcept C95457728 @default.
W4380136141 hasConceptScore W4380136141C111472728 @default.
W4380136141 hasConceptScore W4380136141C137293760 @default.
W4380136141 hasConceptScore W4380136141C138885662 @default.
W4380136141 hasConceptScore W4380136141C154945302 @default.
W4380136141 hasConceptScore W4380136141C166957645 @default.
W4380136141 hasConceptScore W4380136141C199360897 @default.
W4380136141 hasConceptScore W4380136141C2522767166 @default.
W4380136141 hasConceptScore W4380136141C2777904410 @default.
W4380136141 hasConceptScore W4380136141C2779530757 @default.
W4380136141 hasConceptScore W4380136141C3018397939 @default.
W4380136141 hasConceptScore W4380136141C41008148 @default.
W4380136141 hasConceptScore W4380136141C79581498 @default.
W4380136141 hasConceptScore W4380136141C95457728 @default.
W4380136141 hasLocation W43801361411 @default.
W4380136141 hasOpenAccess W4380136141 @default.
W4380136141 hasPrimaryLocation W43801361411 @default.
W4380136141 hasRelatedWork W1989705153 @default.
W4380136141 hasRelatedWork W2057087473 @default.
W4380136141 hasRelatedWork W2093683727 @default.
W4380136141 hasRelatedWork W2363501692 @default.
W4380136141 hasRelatedWork W2392714184 @default.
W4380136141 hasRelatedWork W2465616004 @default.
W4380136141 hasRelatedWork W2496228846 @default.
W4380136141 hasRelatedWork W2896411932 @default.
W4380136141 hasRelatedWork W3005935371 @default.
W4380136141 hasRelatedWork W4283071077 @default.
W4380136141 isParatext "false" @default.
W4380136141 isRetracted "false" @default.
W4380136141 workType "article" @default.