Matches in SemOpenAlex for { <https://semopenalex.org/work/W4385262477> ?p ?o ?g. }
Showing items 1 to 69 of
69
with 100 items per page.
- W4385262477 abstract "Large language models (LLMs) and Vision-Language Models (VLMs) have been proven to excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be, they are not grounded in the 3D physical world, which involves richer concepts such as spatial relationships, affordances, physics, layout, and so on. In this work, we propose to inject the 3D world into large language models and introduce a whole new family of 3D-LLMs. Specifically, 3D-LLMs can take 3D point clouds and their features as input and perform a diverse set of 3D-related tasks, including captioning, dense captioning, 3D question answering, task decomposition, 3D grounding, 3D-assisted dialog, navigation, and so on. Using three types of prompting mechanisms that we design, we are able to collect over 300k 3D-language data covering these tasks. To efficiently train 3D-LLMs, we first utilize a 3D feature extractor that obtains 3D features from rendered multi- view images. Then, we use 2D VLMs as our backbones to train our 3D-LLMs. By introducing a 3D localization mechanism, 3D-LLMs can better capture 3D spatial information. Experiments on ScanQA show that our model outperforms state-of-the-art baselines by a large margin (e.g., the BLEU-1 score surpasses state-of-the-art score by 9%). Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs. Qualitative examples also show that our model could perform more tasks beyond the scope of existing LLMs and VLMs. Project Page: : https://vis-www.cs.umass.edu/3dllm/." @default.
- W4385262477 created "2023-07-26" @default.
- W4385262477 creator A5000069468 @default.
- W4385262477 creator A5004837552 @default.
- W4385262477 creator A5030471889 @default.
- W4385262477 creator A5040877128 @default.
- W4385262477 creator A5067388487 @default.
- W4385262477 creator A5075364910 @default.
- W4385262477 creator A5081993175 @default.
- W4385262477 date "2023-07-24" @default.
- W4385262477 modified "2023-09-23" @default.
- W4385262477 title "3D-LLM: Injecting the 3D World into Large Language Models" @default.
- W4385262477 doi "https://doi.org/10.48550/arxiv.2307.12981" @default.
- W4385262477 hasPublicationYear "2023" @default.
- W4385262477 type Work @default.
- W4385262477 citedByCount "0" @default.
- W4385262477 crossrefType "posted-content" @default.
- W4385262477 hasAuthorship W4385262477A5000069468 @default.
- W4385262477 hasAuthorship W4385262477A5004837552 @default.
- W4385262477 hasAuthorship W4385262477A5030471889 @default.
- W4385262477 hasAuthorship W4385262477A5040877128 @default.
- W4385262477 hasAuthorship W4385262477A5067388487 @default.
- W4385262477 hasAuthorship W4385262477A5075364910 @default.
- W4385262477 hasAuthorship W4385262477A5081993175 @default.
- W4385262477 hasBestOaLocation W43852624771 @default.
- W4385262477 hasConcept C115961682 @default.
- W4385262477 hasConcept C127413603 @default.
- W4385262477 hasConcept C137293760 @default.
- W4385262477 hasConcept C154945302 @default.
- W4385262477 hasConcept C157657479 @default.
- W4385262477 hasConcept C177264268 @default.
- W4385262477 hasConcept C199360897 @default.
- W4385262477 hasConcept C201995342 @default.
- W4385262477 hasConcept C204321447 @default.
- W4385262477 hasConcept C2524010 @default.
- W4385262477 hasConcept C2780451532 @default.
- W4385262477 hasConcept C28719098 @default.
- W4385262477 hasConcept C33923547 @default.
- W4385262477 hasConcept C41008148 @default.
- W4385262477 hasConceptScore W4385262477C115961682 @default.
- W4385262477 hasConceptScore W4385262477C127413603 @default.
- W4385262477 hasConceptScore W4385262477C137293760 @default.
- W4385262477 hasConceptScore W4385262477C154945302 @default.
- W4385262477 hasConceptScore W4385262477C157657479 @default.
- W4385262477 hasConceptScore W4385262477C177264268 @default.
- W4385262477 hasConceptScore W4385262477C199360897 @default.
- W4385262477 hasConceptScore W4385262477C201995342 @default.
- W4385262477 hasConceptScore W4385262477C204321447 @default.
- W4385262477 hasConceptScore W4385262477C2524010 @default.
- W4385262477 hasConceptScore W4385262477C2780451532 @default.
- W4385262477 hasConceptScore W4385262477C28719098 @default.
- W4385262477 hasConceptScore W4385262477C33923547 @default.
- W4385262477 hasConceptScore W4385262477C41008148 @default.
- W4385262477 hasLocation W43852624771 @default.
- W4385262477 hasOpenAccess W4385262477 @default.
- W4385262477 hasPrimaryLocation W43852624771 @default.
- W4385262477 hasRelatedWork W1772447446 @default.
- W4385262477 hasRelatedWork W2359001871 @default.
- W4385262477 hasRelatedWork W2547835662 @default.
- W4385262477 hasRelatedWork W2735824434 @default.
- W4385262477 hasRelatedWork W2963898017 @default.
- W4385262477 hasRelatedWork W2967344709 @default.
- W4385262477 hasRelatedWork W4200486724 @default.
- W4385262477 hasRelatedWork W4205820553 @default.
- W4385262477 hasRelatedWork W4224006678 @default.
- W4385262477 hasRelatedWork W4323830248 @default.
- W4385262477 isParatext "false" @default.
- W4385262477 isRetracted "false" @default.
- W4385262477 workType "article" @default.