Matches in SemOpenAlex for { <https://semopenalex.org/work/W4384821243> ?p ?o ?g. }
- W4384821243 abstract "Abstract Background Artificial intelligence (AI) has the potential to dramatically alter healthcare by enhancing how we diagnosis and treat disease. One promising AI model is ChatGPT, a large general-purpose language model trained by OpenAI. The chat interface has shown robust, human-level performance on several professional and academic benchmarks. We sought to probe its performance and stability over time on surgical case questions. Methods We evaluated the performance of ChatGPT-4 on two surgical knowledge assessments: the Surgical Council on Resident Education (SCORE) and a second commonly used knowledge assessment, referred to as Data-B. Questions were entered in two formats: open-ended and multiple choice. ChatGPT output were assessed for accuracy and insights by surgeon evaluators. We categorized reasons for model errors and the stability of performance on repeat encounters. Results A total of 167 SCORE and 112 Data-B questions were presented to the ChatGPT interface. ChatGPT correctly answered 71% and 68% of multiple-choice SCORE and Data-B questions, respectively. For both open-ended and multiple-choice questions, approximately two-thirds of ChatGPT responses contained non-obvious insights. Common reasons for inaccurate responses included: inaccurate information in a complex question (n=16, 36.4%); inaccurate information in fact-based question (n=11, 25.0%); and accurate information with circumstantial discrepancy (n=6, 13.6%). Upon repeat query, the answer selected by ChatGPT varied for 36.4% of inaccurate questions; the response accuracy changed for 6/16 questions. Conclusion Consistent with prior findings, we demonstrate robust near or above human-level performance of ChatGPT within the surgical domain. Unique to this study, we demonstrate a substantial inconsistency in ChatGPT responses with repeat query. This finding warrants future consideration and presents an opportunity to further train these models to provide safe and consistent responses. Without mental and/or conceptual models, it is unclear whether language models such as ChatGPT would be able to safely assist clinicians in providing care." @default.
- W4384821243 created "2023-07-21" @default.
- W4384821243 creator A5000831058 @default.
- W4384821243 creator A5002917273 @default.
- W4384821243 creator A5026395633 @default.
- W4384821243 creator A5036891574 @default.
- W4384821243 creator A5042842059 @default.
- W4384821243 creator A5044569746 @default.
- W4384821243 date "2023-07-19" @default.
- W4384821243 modified "2023-10-16" @default.
- W4384821243 title "Evaluating Capabilities of Large Language Models: Performance of GPT4 on Surgical Knowledge Assessments" @default.
- W4384821243 cites W1978029960 @default.
- W4384821243 cites W2062321498 @default.
- W4384821243 cites W2790209545 @default.
- W4384821243 cites W2794518994 @default.
- W4384821243 cites W2805089815 @default.
- W4384821243 cites W2944958482 @default.
- W4384821243 cites W2952726619 @default.
- W4384821243 cites W2971361125 @default.
- W4384821243 cites W2980542546 @default.
- W4384821243 cites W2994308387 @default.
- W4384821243 cites W2996280595 @default.
- W4384821243 cites W3004650426 @default.
- W4384821243 cites W3024173558 @default.
- W4384821243 cites W3109246949 @default.
- W4384821243 cites W3110784362 @default.
- W4384821243 cites W3127622810 @default.
- W4384821243 cites W3201437744 @default.
- W4384821243 cites W3215322027 @default.
- W4384821243 cites W4288926427 @default.
- W4384821243 cites W4306247339 @default.
- W4384821243 cites W4308432230 @default.
- W4384821243 cites W4319460874 @default.
- W4384821243 cites W4319662928 @default.
- W4384821243 cites W4321351832 @default.
- W4384821243 cites W4321435202 @default.
- W4384821243 cites W4321436564 @default.
- W4384821243 cites W4321459182 @default.
- W4384821243 cites W4322208207 @default.
- W4384821243 cites W4322500537 @default.
- W4384821243 cites W4322622443 @default.
- W4384821243 cites W4322626024 @default.
- W4384821243 cites W4323050332 @default.
- W4384821243 cites W4323350039 @default.
- W4384821243 cites W4323920689 @default.
- W4384821243 cites W4324308091 @default.
- W4384821243 cites W4324387439 @default.
- W4384821243 cites W4327946446 @default.
- W4384821243 cites W4353016766 @default.
- W4384821243 cites W4360840406 @default.
- W4384821243 cites W4360938283 @default.
- W4384821243 cites W4361284497 @default.
- W4384821243 cites W4362521774 @default.
- W4384821243 doi "https://doi.org/10.1101/2023.07.16.23292743" @default.
- W4384821243 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/37502981" @default.
- W4384821243 hasPublicationYear "2023" @default.
- W4384821243 type Work @default.
- W4384821243 citedByCount "2" @default.
- W4384821243 countsByYear W43848212432023 @default.
- W4384821243 crossrefType "posted-content" @default.
- W4384821243 hasAuthorship W4384821243A5000831058 @default.
- W4384821243 hasAuthorship W4384821243A5002917273 @default.
- W4384821243 hasAuthorship W4384821243A5026395633 @default.
- W4384821243 hasAuthorship W4384821243A5036891574 @default.
- W4384821243 hasAuthorship W4384821243A5042842059 @default.
- W4384821243 hasAuthorship W4384821243A5044569746 @default.
- W4384821243 hasBestOaLocation W43848212431 @default.
- W4384821243 hasConcept C112972136 @default.
- W4384821243 hasConcept C113843644 @default.
- W4384821243 hasConcept C119857082 @default.
- W4384821243 hasConcept C126322002 @default.
- W4384821243 hasConcept C129307140 @default.
- W4384821243 hasConcept C154945302 @default.
- W4384821243 hasConcept C157915830 @default.
- W4384821243 hasConcept C173608175 @default.
- W4384821243 hasConcept C176730311 @default.
- W4384821243 hasConcept C204321447 @default.
- W4384821243 hasConcept C23123220 @default.
- W4384821243 hasConcept C3018023364 @default.
- W4384821243 hasConcept C41008148 @default.
- W4384821243 hasConcept C71924100 @default.
- W4384821243 hasConceptScore W4384821243C112972136 @default.
- W4384821243 hasConceptScore W4384821243C113843644 @default.
- W4384821243 hasConceptScore W4384821243C119857082 @default.
- W4384821243 hasConceptScore W4384821243C126322002 @default.
- W4384821243 hasConceptScore W4384821243C129307140 @default.
- W4384821243 hasConceptScore W4384821243C154945302 @default.
- W4384821243 hasConceptScore W4384821243C157915830 @default.
- W4384821243 hasConceptScore W4384821243C173608175 @default.
- W4384821243 hasConceptScore W4384821243C176730311 @default.
- W4384821243 hasConceptScore W4384821243C204321447 @default.
- W4384821243 hasConceptScore W4384821243C23123220 @default.
- W4384821243 hasConceptScore W4384821243C3018023364 @default.
- W4384821243 hasConceptScore W4384821243C41008148 @default.
- W4384821243 hasConceptScore W4384821243C71924100 @default.
- W4384821243 hasLocation W43848212431 @default.
- W4384821243 hasLocation W43848212432 @default.
- W4384821243 hasLocation W43848212433 @default.
- W4384821243 hasOpenAccess W4384821243 @default.
- W4384821243 hasPrimaryLocation W43848212431 @default.