Matches in SemOpenAlex for { <https://semopenalex.org/work/W4319460874> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4319460874 endingPage "e45312" @default.
- W4319460874 startingPage "e45312" @default.
- W4319460874 abstract "Background Chat Generative Pre-trained Transformer (ChatGPT) is a 175-billion-parameter natural language processing model that can generate conversation-style responses to user input. Objective This study aimed to evaluate the performance of ChatGPT on questions within the scope of the United States Medical Licensing Examination Step 1 and Step 2 exams, as well as to analyze responses for user interpretability. Methods We used 2 sets of multiple-choice questions to evaluate ChatGPT’s performance, each with questions pertaining to Step 1 and Step 2. The first set was derived from AMBOSS, a commonly used question bank for medical students, which also provides statistics on question difficulty and the performance on an exam relative to the user base. The second set was the National Board of Medical Examiners (NBME) free 120 questions. ChatGPT’s performance was compared to 2 other large language models, GPT-3 and InstructGPT. The text output of each ChatGPT response was evaluated across 3 qualitative metrics: logical justification of the answer selected, presence of information internal to the question, and presence of information external to the question. Results Of the 4 data sets, AMBOSS-Step1, AMBOSS-Step2, NBME-Free-Step1, and NBME-Free-Step2, ChatGPT achieved accuracies of 44% (44/100), 42% (42/100), 64.4% (56/87), and 57.8% (59/102), respectively. ChatGPT outperformed InstructGPT by 8.15% on average across all data sets, and GPT-3 performed similarly to random chance. The model demonstrated a significant decrease in performance as question difficulty increased (P=.01) within the AMBOSS-Step1 data set. We found that logical justification for ChatGPT’s answer selection was present in 100% of outputs of the NBME data sets. Internal information to the question was present in 96.8% (183/189) of all questions. The presence of information external to the question was 44.5% and 27% lower for incorrect answers relative to correct answers on the NBME-Free-Step1 (P<.001) and NBME-Free-Step2 (P=.001) data sets, respectively. Conclusions ChatGPT marks a significant improvement in natural language processing models on the tasks of medical question answering. By performing at a greater than 60% threshold on the NBME-Free-Step-1 data set, we show that the model achieves the equivalent of a passing score for a third-year medical student. Additionally, we highlight ChatGPT’s capacity to provide logic and informational context across the majority of answers. These facts taken together make a compelling case for the potential applications of ChatGPT as an interactive medical education tool to support learning." @default.
- W4319460874 created "2023-02-09" @default.
- W4319460874 creator A5009586285 @default.
- W4319460874 creator A5012426158 @default.
- W4319460874 creator A5019426692 @default.
- W4319460874 creator A5048675944 @default.
- W4319460874 creator A5049597319 @default.
- W4319460874 creator A5078468272 @default.
- W4319460874 creator A5088607405 @default.
- W4319460874 date "2023-02-08" @default.
- W4319460874 modified "2023-10-13" @default.
- W4319460874 title "How Does ChatGPT Perform on the United States Medical Licensing Examination? The Implications of Large Language Models for Medical Education and Knowledge Assessment" @default.
- W4319460874 cites W1995903131 @default.
- W4319460874 cites W2058046738 @default.
- W4319460874 cites W2113671972 @default.
- W4319460874 cites W2132053012 @default.
- W4319460874 cites W2211637316 @default.
- W4319460874 cites W2911109671 @default.
- W4319460874 cites W2952752960 @default.
- W4319460874 cites W2972522091 @default.
- W4319460874 cites W2987501933 @default.
- W4319460874 cites W3081747209 @default.
- W4319460874 cites W3090073303 @default.
- W4319460874 cites W3162922479 @default.
- W4319460874 cites W4281252097 @default.
- W4319460874 cites W4285124505 @default.
- W4319460874 cites W4286985375 @default.
- W4319460874 cites W4292779060 @default.
- W4319460874 cites W4296415471 @default.
- W4319460874 cites W4385245566 @default.
- W4319460874 cites W61072347 @default.
- W4319460874 doi "https://doi.org/10.2196/45312" @default.
- W4319460874 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36753318" @default.
- W4319460874 hasPublicationYear "2023" @default.
- W4319460874 type Work @default.
- W4319460874 citedByCount "274" @default.
- W4319460874 countsByYear W43194608742023 @default.
- W4319460874 crossrefType "journal-article" @default.
- W4319460874 hasAuthorship W4319460874A5009586285 @default.
- W4319460874 hasAuthorship W4319460874A5012426158 @default.
- W4319460874 hasAuthorship W4319460874A5019426692 @default.
- W4319460874 hasAuthorship W4319460874A5048675944 @default.
- W4319460874 hasAuthorship W4319460874A5049597319 @default.
- W4319460874 hasAuthorship W4319460874A5078468272 @default.
- W4319460874 hasAuthorship W4319460874A5088607405 @default.
- W4319460874 hasBestOaLocation W43194608741 @default.
- W4319460874 hasConcept C119857082 @default.
- W4319460874 hasConcept C154945302 @default.
- W4319460874 hasConcept C15744967 @default.
- W4319460874 hasConcept C177264268 @default.
- W4319460874 hasConcept C199360897 @default.
- W4319460874 hasConcept C204321447 @default.
- W4319460874 hasConcept C2777200299 @default.
- W4319460874 hasConcept C2781067378 @default.
- W4319460874 hasConcept C41008148 @default.
- W4319460874 hasConcept C46312422 @default.
- W4319460874 hasConceptScore W4319460874C119857082 @default.
- W4319460874 hasConceptScore W4319460874C154945302 @default.
- W4319460874 hasConceptScore W4319460874C15744967 @default.
- W4319460874 hasConceptScore W4319460874C177264268 @default.
- W4319460874 hasConceptScore W4319460874C199360897 @default.
- W4319460874 hasConceptScore W4319460874C204321447 @default.
- W4319460874 hasConceptScore W4319460874C2777200299 @default.
- W4319460874 hasConceptScore W4319460874C2781067378 @default.
- W4319460874 hasConceptScore W4319460874C41008148 @default.
- W4319460874 hasConceptScore W4319460874C46312422 @default.
- W4319460874 hasLocation W43194608741 @default.
- W4319460874 hasLocation W43194608742 @default.
- W4319460874 hasLocation W43194608743 @default.
- W4319460874 hasOpenAccess W4319460874 @default.
- W4319460874 hasPrimaryLocation W43194608741 @default.
- W4319460874 hasRelatedWork W2496949096 @default.
- W4319460874 hasRelatedWork W3006943036 @default.
- W4319460874 hasRelatedWork W4200511449 @default.
- W4319460874 hasRelatedWork W4206534706 @default.
- W4319460874 hasRelatedWork W4229079080 @default.
- W4319460874 hasRelatedWork W4299487748 @default.
- W4319460874 hasRelatedWork W4385767940 @default.
- W4319460874 hasRelatedWork W4385957992 @default.
- W4319460874 hasRelatedWork W4385965371 @default.
- W4319460874 hasRelatedWork W4386025632 @default.
- W4319460874 hasVolume "9" @default.
- W4319460874 isParatext "false" @default.
- W4319460874 isRetracted "false" @default.
- W4319460874 workType "article" @default.