Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386457251> ?p ?o ?g. }
Showing items 1 to 92 of
92
with 100 items per page.
- W4386457251 endingPage "e50514" @default.
- W4386457251 startingPage "e50514" @default.
- W4386457251 abstract "Background Large language model (LLM)–based chatbots are evolving at an unprecedented pace with the release of ChatGPT, specifically GPT-3.5, and its successor, GPT-4. Their capabilities in general-purpose tasks and language generation have advanced to the point of performing excellently on various educational examination benchmarks, including medical knowledge tests. Comparing the performance of these 2 LLM models to that of Family Medicine residents on a multiple-choice medical knowledge test can provide insights into their potential as medical education tools. Objective This study aimed to quantitatively and qualitatively compare the performance of GPT-3.5, GPT-4, and Family Medicine residents in a multiple-choice medical knowledge test appropriate for the level of a Family Medicine resident. Methods An official University of Toronto Department of Family and Community Medicine Progress Test consisting of multiple-choice questions was inputted into GPT-3.5 and GPT-4. The artificial intelligence chatbot’s responses were manually reviewed to determine the selected answer, response length, response time, provision of a rationale for the outputted response, and the root cause of all incorrect responses (classified into arithmetic, logical, and information errors). The performance of the artificial intelligence chatbots were compared against a cohort of Family Medicine residents who concurrently attempted the test. Results GPT-4 performed significantly better compared to GPT-3.5 (difference 25.0%, 95% CI 16.3%-32.8%; McNemar test: P<.001); it correctly answered 89/108 (82.4%) questions, while GPT-3.5 answered 62/108 (57.4%) questions correctly. Further, GPT-4 scored higher across all 11 categories of Family Medicine knowledge. In 86.1% (n=93) of the responses, GPT-4 provided a rationale for why other multiple-choice options were not chosen compared to the 16.7% (n=18) achieved by GPT-3.5. Qualitatively, for both GPT-3.5 and GPT-4 responses, logical errors were the most common, while arithmetic errors were the least common. The average performance of Family Medicine residents was 56.9% (95% CI 56.2%-57.6%). The performance of GPT-3.5 was similar to that of the average Family Medicine resident (P=.16), while the performance of GPT-4 exceeded that of the top-performing Family Medicine resident (P<.001). Conclusions GPT-4 significantly outperforms both GPT-3.5 and Family Medicine residents on a multiple-choice medical knowledge test designed for Family Medicine residents. GPT-4 provides a logical rationale for its response choice, ruling out other answer choices efficiently and with concise justification. Its high degree of accuracy and advanced reasoning capabilities facilitate its potential applications in medical education, including the creation of exam questions and scenarios as well as serving as a resource for medical knowledge or information on community services." @default.
- W4386457251 created "2023-09-06" @default.
- W4386457251 creator A5002679050 @default.
- W4386457251 creator A5027560185 @default.
- W4386457251 creator A5028314591 @default.
- W4386457251 creator A5068472891 @default.
- W4386457251 creator A5068734036 @default.
- W4386457251 creator A5088217018 @default.
- W4386457251 date "2023-09-19" @default.
- W4386457251 modified "2023-10-16" @default.
- W4386457251 title "Assessment of Resident and Artificial Intelligence Chatbot Performance on the University of Toronto Family Medicine Residency Progress Test: A Comparative Study (Preprint)" @default.
- W4386457251 cites W1555128924 @default.
- W4386457251 cites W2069704010 @default.
- W4386457251 cites W2140208577 @default.
- W4386457251 cites W2473912624 @default.
- W4386457251 cites W4309364917 @default.
- W4386457251 cites W4309674289 @default.
- W4386457251 cites W4319350602 @default.
- W4386457251 cites W4319460874 @default.
- W4386457251 cites W4319662928 @default.
- W4386457251 cites W4320736349 @default.
- W4386457251 cites W4321351832 @default.
- W4386457251 cites W4322759267 @default.
- W4386457251 cites W4323266626 @default.
- W4386457251 cites W4362515116 @default.
- W4386457251 cites W4365512576 @default.
- W4386457251 cites W4377098551 @default.
- W4386457251 cites W4380685958 @default.
- W4386457251 cites W4385786223 @default.
- W4386457251 doi "https://doi.org/10.2196/50514" @default.
- W4386457251 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37725411" @default.
- W4386457251 hasPublicationYear "2023" @default.
- W4386457251 type Work @default.
- W4386457251 citedByCount "0" @default.
- W4386457251 crossrefType "journal-article" @default.
- W4386457251 hasAuthorship W4386457251A5002679050 @default.
- W4386457251 hasAuthorship W4386457251A5027560185 @default.
- W4386457251 hasAuthorship W4386457251A5028314591 @default.
- W4386457251 hasAuthorship W4386457251A5068472891 @default.
- W4386457251 hasAuthorship W4386457251A5068734036 @default.
- W4386457251 hasAuthorship W4386457251A5088217018 @default.
- W4386457251 hasBestOaLocation W43864572511 @default.
- W4386457251 hasConcept C105795698 @default.
- W4386457251 hasConcept C13280743 @default.
- W4386457251 hasConcept C151730666 @default.
- W4386457251 hasConcept C154945302 @default.
- W4386457251 hasConcept C15744967 @default.
- W4386457251 hasConcept C186282968 @default.
- W4386457251 hasConcept C205649164 @default.
- W4386457251 hasConcept C2777267654 @default.
- W4386457251 hasConcept C2777526511 @default.
- W4386457251 hasConcept C33923547 @default.
- W4386457251 hasConcept C41008148 @default.
- W4386457251 hasConcept C509550671 @default.
- W4386457251 hasConcept C512399662 @default.
- W4386457251 hasConcept C71924100 @default.
- W4386457251 hasConcept C86803240 @default.
- W4386457251 hasConceptScore W4386457251C105795698 @default.
- W4386457251 hasConceptScore W4386457251C13280743 @default.
- W4386457251 hasConceptScore W4386457251C151730666 @default.
- W4386457251 hasConceptScore W4386457251C154945302 @default.
- W4386457251 hasConceptScore W4386457251C15744967 @default.
- W4386457251 hasConceptScore W4386457251C186282968 @default.
- W4386457251 hasConceptScore W4386457251C205649164 @default.
- W4386457251 hasConceptScore W4386457251C2777267654 @default.
- W4386457251 hasConceptScore W4386457251C2777526511 @default.
- W4386457251 hasConceptScore W4386457251C33923547 @default.
- W4386457251 hasConceptScore W4386457251C41008148 @default.
- W4386457251 hasConceptScore W4386457251C509550671 @default.
- W4386457251 hasConceptScore W4386457251C512399662 @default.
- W4386457251 hasConceptScore W4386457251C71924100 @default.
- W4386457251 hasConceptScore W4386457251C86803240 @default.
- W4386457251 hasLocation W43864572511 @default.
- W4386457251 hasLocation W43864572512 @default.
- W4386457251 hasOpenAccess W4386457251 @default.
- W4386457251 hasPrimaryLocation W43864572511 @default.
- W4386457251 hasRelatedWork W2064023586 @default.
- W4386457251 hasRelatedWork W2149889956 @default.
- W4386457251 hasRelatedWork W2386723501 @default.
- W4386457251 hasRelatedWork W2387879414 @default.
- W4386457251 hasRelatedWork W2390304029 @default.
- W4386457251 hasRelatedWork W2601473374 @default.
- W4386457251 hasRelatedWork W2899084033 @default.
- W4386457251 hasRelatedWork W3151422078 @default.
- W4386457251 hasRelatedWork W4232149648 @default.
- W4386457251 hasRelatedWork W4250308522 @default.
- W4386457251 hasVolume "9" @default.
- W4386457251 isParatext "false" @default.
- W4386457251 isRetracted "false" @default.
- W4386457251 workType "article" @default.