Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385958573> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4385958573 abstract "Although perception systems have made remarkable advancements in recent years, they still rely on explicit human instruction to identify the target objects or categories before executing visual recognition tasks. Such systems lack the ability to actively reason and comprehend implicit user intentions. In this work, we propose a new segmentation task -- reasoning segmentation. The task is designed to output a segmentation mask given a complex and implicit query text. Furthermore, we establish a benchmark comprising over one thousand image-instruction pairs, incorporating intricate reasoning and world knowledge for evaluation purposes. Finally, we present LISA: large Language Instructed Segmentation Assistant, which inherits the language generation capabilities of the multi-modal Large Language Model (LLM) while also possessing the ability to produce segmentation masks. We expand the original vocabulary with a <SEG> token and propose the embedding-as-mask paradigm to unlock the segmentation capability. Remarkably, LISA can handle cases involving: 1) complex reasoning; 2) world knowledge; 3) explanatory answers; 4) multi-turn conversation. Also, it demonstrates robust zero-shot capability when trained exclusively on reasoning-free datasets. In addition, fine-tuning the model with merely 239 reasoning segmentation image-instruction pairs results in further performance enhancement. Experiments show our method not only unlocks new reasoning segmentation capabilities but also proves effective in both complex reasoning segmentation and standard referring segmentation tasks. Code, models, and demo are at https://github.com/dvlab-research/LISA." @default.
- W4385958573 created "2023-08-18" @default.
- W4385958573 creator A5002034669 @default.
- W4385958573 creator A5031930852 @default.
- W4385958573 creator A5038086218 @default.
- W4385958573 creator A5052856441 @default.
- W4385958573 creator A5059845882 @default.
- W4385958573 creator A5060855982 @default.
- W4385958573 creator A5062788756 @default.
- W4385958573 date "2023-08-01" @default.
- W4385958573 modified "2023-09-23" @default.
- W4385958573 title "LISA: Reasoning Segmentation via Large Language Model" @default.
- W4385958573 doi "https://doi.org/10.48550/arxiv.2308.00692" @default.
- W4385958573 hasPublicationYear "2023" @default.
- W4385958573 type Work @default.
- W4385958573 citedByCount "0" @default.
- W4385958573 crossrefType "posted-content" @default.
- W4385958573 hasAuthorship W4385958573A5002034669 @default.
- W4385958573 hasAuthorship W4385958573A5031930852 @default.
- W4385958573 hasAuthorship W4385958573A5038086218 @default.
- W4385958573 hasAuthorship W4385958573A5052856441 @default.
- W4385958573 hasAuthorship W4385958573A5059845882 @default.
- W4385958573 hasAuthorship W4385958573A5060855982 @default.
- W4385958573 hasAuthorship W4385958573A5062788756 @default.
- W4385958573 hasBestOaLocation W43859585731 @default.
- W4385958573 hasConcept C119857082 @default.
- W4385958573 hasConcept C13280743 @default.
- W4385958573 hasConcept C137293760 @default.
- W4385958573 hasConcept C138885662 @default.
- W4385958573 hasConcept C154945302 @default.
- W4385958573 hasConcept C162324750 @default.
- W4385958573 hasConcept C185798385 @default.
- W4385958573 hasConcept C187736073 @default.
- W4385958573 hasConcept C204321447 @default.
- W4385958573 hasConcept C205649164 @default.
- W4385958573 hasConcept C2777200299 @default.
- W4385958573 hasConcept C2780451532 @default.
- W4385958573 hasConcept C41008148 @default.
- W4385958573 hasConcept C41895202 @default.
- W4385958573 hasConcept C89600930 @default.
- W4385958573 hasConceptScore W4385958573C119857082 @default.
- W4385958573 hasConceptScore W4385958573C13280743 @default.
- W4385958573 hasConceptScore W4385958573C137293760 @default.
- W4385958573 hasConceptScore W4385958573C138885662 @default.
- W4385958573 hasConceptScore W4385958573C154945302 @default.
- W4385958573 hasConceptScore W4385958573C162324750 @default.
- W4385958573 hasConceptScore W4385958573C185798385 @default.
- W4385958573 hasConceptScore W4385958573C187736073 @default.
- W4385958573 hasConceptScore W4385958573C204321447 @default.
- W4385958573 hasConceptScore W4385958573C205649164 @default.
- W4385958573 hasConceptScore W4385958573C2777200299 @default.
- W4385958573 hasConceptScore W4385958573C2780451532 @default.
- W4385958573 hasConceptScore W4385958573C41008148 @default.
- W4385958573 hasConceptScore W4385958573C41895202 @default.
- W4385958573 hasConceptScore W4385958573C89600930 @default.
- W4385958573 hasLocation W43859585731 @default.
- W4385958573 hasOpenAccess W4385958573 @default.
- W4385958573 hasPrimaryLocation W43859585731 @default.
- W4385958573 hasRelatedWork W112744582 @default.
- W4385958573 hasRelatedWork W1485630101 @default.
- W4385958573 hasRelatedWork W2359001871 @default.
- W4385958573 hasRelatedWork W2496949096 @default.
- W4385958573 hasRelatedWork W2498017833 @default.
- W4385958573 hasRelatedWork W2961085424 @default.
- W4385958573 hasRelatedWork W2983785000 @default.
- W4385958573 hasRelatedWork W4205820553 @default.
- W4385958573 hasRelatedWork W4306674287 @default.
- W4385958573 hasRelatedWork W4378473991 @default.
- W4385958573 isParatext "false" @default.
- W4385958573 isRetracted "false" @default.
- W4385958573 workType "article" @default.