Matches in SemOpenAlex for { <https://semopenalex.org/work/W4379540389> ?p ?o ?g. }
Showing items 1 to 75 of
75
with 100 items per page.
- W4379540389 abstract "Recent advancements in large language models (LLMs) have transformed the field of question answering (QA). However, evaluating LLMs in the medical field is challenging due to the lack of standardized and comprehensive datasets. To address this gap, we introduce CMExam, sourced from the Chinese National Medical Licensing Examination. CMExam consists of 60K+ multiple-choice questions for standardized and objective evaluations, as well as solution explanations for model reasoning evaluation in an open-ended manner. For in-depth analyses of LLMs, we invited medical professionals to label five additional question-wise annotations, including disease groups, clinical departments, medical disciplines, areas of competency, and question difficulty levels. Alongside the dataset, we further conducted thorough experiments with representative LLMs and QA algorithms on CMExam. The results show that GPT-4 had the best accuracy of 61.6% and a weighted F1 score of 0.617. These results highlight a great disparity when compared to human accuracy, which stood at 71.6%. For explanation tasks, while LLMs could generate relevant reasoning and demonstrate improved performance after finetuning, they fall short of a desired standard, indicating ample room for improvement. To the best of our knowledge, CMExam is the first Chinese medical exam dataset to provide comprehensive medical annotations. The experiments and findings of LLM evaluation also provide valuable insights into the challenges and potential solutions in developing Chinese medical QA systems and LLM evaluation pipelines. The dataset and relevant code are available at https://github.com/williamliujl/CMExam." @default.
- W4379540389 created "2023-06-07" @default.
- W4379540389 creator A5018113568 @default.
- W4379540389 creator A5036151193 @default.
- W4379540389 creator A5038837596 @default.
- W4379540389 creator A5039141061 @default.
- W4379540389 creator A5049882309 @default.
- W4379540389 creator A5052723398 @default.
- W4379540389 creator A5068370945 @default.
- W4379540389 creator A5076320750 @default.
- W4379540389 creator A5081953757 @default.
- W4379540389 creator A5083041690 @default.
- W4379540389 creator A5091834117 @default.
- W4379540389 date "2023-06-05" @default.
- W4379540389 modified "2023-09-23" @default.
- W4379540389 title "Benchmarking Large Language Models on CMExam -- A Comprehensive Chinese Medical Exam Dataset" @default.
- W4379540389 doi "https://doi.org/10.48550/arxiv.2306.03030" @default.
- W4379540389 hasPublicationYear "2023" @default.
- W4379540389 type Work @default.
- W4379540389 citedByCount "0" @default.
- W4379540389 crossrefType "posted-content" @default.
- W4379540389 hasAuthorship W4379540389A5018113568 @default.
- W4379540389 hasAuthorship W4379540389A5036151193 @default.
- W4379540389 hasAuthorship W4379540389A5038837596 @default.
- W4379540389 hasAuthorship W4379540389A5039141061 @default.
- W4379540389 hasAuthorship W4379540389A5049882309 @default.
- W4379540389 hasAuthorship W4379540389A5052723398 @default.
- W4379540389 hasAuthorship W4379540389A5068370945 @default.
- W4379540389 hasAuthorship W4379540389A5076320750 @default.
- W4379540389 hasAuthorship W4379540389A5081953757 @default.
- W4379540389 hasAuthorship W4379540389A5083041690 @default.
- W4379540389 hasAuthorship W4379540389A5091834117 @default.
- W4379540389 hasBestOaLocation W43795403891 @default.
- W4379540389 hasConcept C144133560 @default.
- W4379540389 hasConcept C162853370 @default.
- W4379540389 hasConcept C17744445 @default.
- W4379540389 hasConcept C184356942 @default.
- W4379540389 hasConcept C199539241 @default.
- W4379540389 hasConcept C202444582 @default.
- W4379540389 hasConcept C2522767166 @default.
- W4379540389 hasConcept C33923547 @default.
- W4379540389 hasConcept C41008148 @default.
- W4379540389 hasConcept C509550671 @default.
- W4379540389 hasConcept C71924100 @default.
- W4379540389 hasConcept C86251818 @default.
- W4379540389 hasConcept C9652623 @default.
- W4379540389 hasConceptScore W4379540389C144133560 @default.
- W4379540389 hasConceptScore W4379540389C162853370 @default.
- W4379540389 hasConceptScore W4379540389C17744445 @default.
- W4379540389 hasConceptScore W4379540389C184356942 @default.
- W4379540389 hasConceptScore W4379540389C199539241 @default.
- W4379540389 hasConceptScore W4379540389C202444582 @default.
- W4379540389 hasConceptScore W4379540389C2522767166 @default.
- W4379540389 hasConceptScore W4379540389C33923547 @default.
- W4379540389 hasConceptScore W4379540389C41008148 @default.
- W4379540389 hasConceptScore W4379540389C509550671 @default.
- W4379540389 hasConceptScore W4379540389C71924100 @default.
- W4379540389 hasConceptScore W4379540389C86251818 @default.
- W4379540389 hasConceptScore W4379540389C9652623 @default.
- W4379540389 hasLocation W43795403891 @default.
- W4379540389 hasOpenAccess W4379540389 @default.
- W4379540389 hasPrimaryLocation W43795403891 @default.
- W4379540389 hasRelatedWork W1976176382 @default.
- W4379540389 hasRelatedWork W2052836219 @default.
- W4379540389 hasRelatedWork W2053225275 @default.
- W4379540389 hasRelatedWork W2132246391 @default.
- W4379540389 hasRelatedWork W2565660773 @default.
- W4379540389 hasRelatedWork W2794416352 @default.
- W4379540389 hasRelatedWork W2899084033 @default.
- W4379540389 hasRelatedWork W2991936270 @default.
- W4379540389 hasRelatedWork W650706805 @default.
- W4379540389 hasRelatedWork W2888551003 @default.
- W4379540389 isParatext "false" @default.
- W4379540389 isRetracted "false" @default.
- W4379540389 workType "article" @default.