Matches in SemOpenAlex for { <https://semopenalex.org/work/W4378468466> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4378468466 abstract "Theory of Mind (ToM), the capacity to comprehend the mental states of distinct individuals, is essential for numerous practical applications. With the development of large language models, there is a heated debate about whether they are able to perform ToM tasks. Previous studies have used different tasks and prompts to test the ToM on large language models and the results are inconsistent: some studies asserted these models are capable of exhibiting ToM, while others suggest the opposite. In this study, We present ToMChallenges, a dataset for comprehensively evaluating Theory of Mind based on Sally-Anne and Smarties tests. We created 30 variations of each test (e.g., changing the person's name, location, and items). For each variation, we test the model's understanding of different aspects: reality, belief, 1st order belief, and 2nd order belief. We adapt our data for various tasks by creating unique prompts tailored for each task category: Fill-in-the-Blank, Multiple Choice, True/False, Chain-of-Thought True/False, Question Answering, and Text Completion. If the model has a robust ToM, it should be able to achieve good performance for different prompts across different tests. We evaluated two GPT-3.5 models, text-davinci-003 and gpt-3.5-turbo-0301, with our datasets. Our results indicate that consistent performance in ToM tasks remains a challenge." @default.
- W4378468466 created "2023-05-27" @default.
- W4378468466 creator A5007556918 @default.
- W4378468466 creator A5068804492 @default.
- W4378468466 creator A5087225739 @default.
- W4378468466 date "2023-05-24" @default.
- W4378468466 modified "2023-09-23" @default.
- W4378468466 title "ToMChallenges: A Principle-Guided Dataset and Diverse Evaluation Tasks for Exploring Theory of Mind" @default.
- W4378468466 doi "https://doi.org/10.48550/arxiv.2305.15068" @default.
- W4378468466 hasPublicationYear "2023" @default.
- W4378468466 type Work @default.
- W4378468466 citedByCount "0" @default.
- W4378468466 crossrefType "posted-content" @default.
- W4378468466 hasAuthorship W4378468466A5007556918 @default.
- W4378468466 hasAuthorship W4378468466A5068804492 @default.
- W4378468466 hasAuthorship W4378468466A5087225739 @default.
- W4378468466 hasBestOaLocation W43784684661 @default.
- W4378468466 hasConcept C10138342 @default.
- W4378468466 hasConcept C151730666 @default.
- W4378468466 hasConcept C154945302 @default.
- W4378468466 hasConcept C15744967 @default.
- W4378468466 hasConcept C162324750 @default.
- W4378468466 hasConcept C169760540 @default.
- W4378468466 hasConcept C169900460 @default.
- W4378468466 hasConcept C180747234 @default.
- W4378468466 hasConcept C182306322 @default.
- W4378468466 hasConcept C18591518 @default.
- W4378468466 hasConcept C187736073 @default.
- W4378468466 hasConcept C188147891 @default.
- W4378468466 hasConcept C204321447 @default.
- W4378468466 hasConcept C2777267654 @default.
- W4378468466 hasConcept C2779560602 @default.
- W4378468466 hasConcept C2780451532 @default.
- W4378468466 hasConcept C2994481395 @default.
- W4378468466 hasConcept C41008148 @default.
- W4378468466 hasConcept C86803240 @default.
- W4378468466 hasConceptScore W4378468466C10138342 @default.
- W4378468466 hasConceptScore W4378468466C151730666 @default.
- W4378468466 hasConceptScore W4378468466C154945302 @default.
- W4378468466 hasConceptScore W4378468466C15744967 @default.
- W4378468466 hasConceptScore W4378468466C162324750 @default.
- W4378468466 hasConceptScore W4378468466C169760540 @default.
- W4378468466 hasConceptScore W4378468466C169900460 @default.
- W4378468466 hasConceptScore W4378468466C180747234 @default.
- W4378468466 hasConceptScore W4378468466C182306322 @default.
- W4378468466 hasConceptScore W4378468466C18591518 @default.
- W4378468466 hasConceptScore W4378468466C187736073 @default.
- W4378468466 hasConceptScore W4378468466C188147891 @default.
- W4378468466 hasConceptScore W4378468466C204321447 @default.
- W4378468466 hasConceptScore W4378468466C2777267654 @default.
- W4378468466 hasConceptScore W4378468466C2779560602 @default.
- W4378468466 hasConceptScore W4378468466C2780451532 @default.
- W4378468466 hasConceptScore W4378468466C2994481395 @default.
- W4378468466 hasConceptScore W4378468466C41008148 @default.
- W4378468466 hasConceptScore W4378468466C86803240 @default.
- W4378468466 hasLocation W43784684661 @default.
- W4378468466 hasOpenAccess W4378468466 @default.
- W4378468466 hasPrimaryLocation W43784684661 @default.
- W4378468466 hasRelatedWork W1978180172 @default.
- W4378468466 hasRelatedWork W2021147489 @default.
- W4378468466 hasRelatedWork W2079310480 @default.
- W4378468466 hasRelatedWork W2090634175 @default.
- W4378468466 hasRelatedWork W2161308639 @default.
- W4378468466 hasRelatedWork W2383208641 @default.
- W4378468466 hasRelatedWork W3088904174 @default.
- W4378468466 hasRelatedWork W3131992771 @default.
- W4378468466 hasRelatedWork W3136142 @default.
- W4378468466 hasRelatedWork W4231534512 @default.
- W4378468466 isParatext "false" @default.
- W4378468466 isRetracted "false" @default.
- W4378468466 workType "article" @default.