Matches in SemOpenAlex for { <https://semopenalex.org/work/W4387561134> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4387561134 abstract "With the increasing capabilities of large language models (LLMs), these high-performance models have achieved state-of-the-art results on a wide range of natural language processing (NLP) tasks. However, the models' performance on commonly-used benchmark datasets often fails to accurately reflect their reliability and robustness when applied to real-world noisy data. To address these challenges, we propose a unified robustness evaluation framework based on the slot-filling task to systematically evaluate the dialogue understanding capability of LLMs in diverse input perturbation scenarios. Specifically, we construct a input perturbation evaluation dataset, Noise-LLM, which contains five types of single perturbation and four types of mixed perturbation data. Furthermore, we utilize a multi-level data augmentation method (character, word, and sentence levels) to construct a candidate data pool, and carefully design two ways of automatic task demonstration construction strategies (instance-level and entity-level) with various prompt templates. Our aim is to assess how well various robustness methods of LLMs perform in real-world noisy scenarios. The experiments have demonstrated that the current open-source LLMs generally achieve limited perturbation robustness performance. Based on these experimental observations, we make some forward-looking suggestions to fuel the research in this direction." @default.
- W4387561134 created "2023-10-12" @default.
- W4387561134 creator A5008160346 @default.
- W4387561134 creator A5008320336 @default.
- W4387561134 creator A5009391425 @default.
- W4387561134 creator A5016651990 @default.
- W4387561134 creator A5017887313 @default.
- W4387561134 creator A5022255710 @default.
- W4387561134 creator A5027438317 @default.
- W4387561134 creator A5046062431 @default.
- W4387561134 creator A5048658005 @default.
- W4387561134 creator A5065626573 @default.
- W4387561134 creator A5093022114 @default.
- W4387561134 date "2023-10-10" @default.
- W4387561134 modified "2023-10-13" @default.
- W4387561134 title "Revisit Input Perturbation Problems for LLMs: A Unified Robustness Evaluation Framework for Noisy Slot Filling Task" @default.
- W4387561134 doi "https://doi.org/10.48550/arxiv.2310.06504" @default.
- W4387561134 hasPublicationYear "2023" @default.
- W4387561134 type Work @default.
- W4387561134 citedByCount "0" @default.
- W4387561134 crossrefType "posted-content" @default.
- W4387561134 hasAuthorship W4387561134A5008160346 @default.
- W4387561134 hasAuthorship W4387561134A5008320336 @default.
- W4387561134 hasAuthorship W4387561134A5009391425 @default.
- W4387561134 hasAuthorship W4387561134A5016651990 @default.
- W4387561134 hasAuthorship W4387561134A5017887313 @default.
- W4387561134 hasAuthorship W4387561134A5022255710 @default.
- W4387561134 hasAuthorship W4387561134A5027438317 @default.
- W4387561134 hasAuthorship W4387561134A5046062431 @default.
- W4387561134 hasAuthorship W4387561134A5048658005 @default.
- W4387561134 hasAuthorship W4387561134A5065626573 @default.
- W4387561134 hasAuthorship W4387561134A5093022114 @default.
- W4387561134 hasBestOaLocation W43875611341 @default.
- W4387561134 hasConcept C104317684 @default.
- W4387561134 hasConcept C119857082 @default.
- W4387561134 hasConcept C121332964 @default.
- W4387561134 hasConcept C154945302 @default.
- W4387561134 hasConcept C177918212 @default.
- W4387561134 hasConcept C185592680 @default.
- W4387561134 hasConcept C2777530160 @default.
- W4387561134 hasConcept C41008148 @default.
- W4387561134 hasConcept C55493867 @default.
- W4387561134 hasConcept C62520636 @default.
- W4387561134 hasConcept C63479239 @default.
- W4387561134 hasConceptScore W4387561134C104317684 @default.
- W4387561134 hasConceptScore W4387561134C119857082 @default.
- W4387561134 hasConceptScore W4387561134C121332964 @default.
- W4387561134 hasConceptScore W4387561134C154945302 @default.
- W4387561134 hasConceptScore W4387561134C177918212 @default.
- W4387561134 hasConceptScore W4387561134C185592680 @default.
- W4387561134 hasConceptScore W4387561134C2777530160 @default.
- W4387561134 hasConceptScore W4387561134C41008148 @default.
- W4387561134 hasConceptScore W4387561134C55493867 @default.
- W4387561134 hasConceptScore W4387561134C62520636 @default.
- W4387561134 hasConceptScore W4387561134C63479239 @default.
- W4387561134 hasLocation W43875611341 @default.
- W4387561134 hasOpenAccess W4387561134 @default.
- W4387561134 hasPrimaryLocation W43875611341 @default.
- W4387561134 hasRelatedWork W2961085424 @default.
- W4387561134 hasRelatedWork W3046775127 @default.
- W4387561134 hasRelatedWork W3107602296 @default.
- W4387561134 hasRelatedWork W3170094116 @default.
- W4387561134 hasRelatedWork W3209574120 @default.
- W4387561134 hasRelatedWork W4210805261 @default.
- W4387561134 hasRelatedWork W4306674287 @default.
- W4387561134 hasRelatedWork W4312192474 @default.
- W4387561134 hasRelatedWork W4386462264 @default.
- W4387561134 hasRelatedWork W4387297750 @default.
- W4387561134 isParatext "false" @default.
- W4387561134 isRetracted "false" @default.
- W4387561134 workType "article" @default.