Matches in SemOpenAlex for { <https://semopenalex.org/work/W4229027288> ?p ?o ?g. }
- W4229027288 endingPage "38" @default.
- W4229027288 startingPage "28" @default.
- W4229027288 abstract "Image caption aims to generate a language description of a given image. The problem can be solved by learning semantic information of visual objects and generating descriptions based on extracted embedding. However, the spatial relationship between visual objects and their static position is not fully explored by existing methods. In this work, we propose a Position-Aware Transformer (PAT) model that extracts both regional and static global visual features and unify both the regional and global by incorporating spatial information aligned to each visual feature. To make a better representation of spatial information and correlation between extracted visual features, we propose and compare three subtle approaches to explore position embedding with spatial relation information explicitly. Moreover, we jointly consider the static global and regional embedding for spatial modeling. Experimental results illustrate that our proposed model achieves competitive performance on the COCO image captioning dataset, where the PAT model could respectively reach 38.7, 28.6, and 58.6 on BLEU-4, METEOR, and ROUGE-L respectively. Extensive experiments suggest that the proposed PAT model could also reach competitive performance on related visual-language tasks including visual question answering (VQA) and multi-modal retrieval. Detailed ablation studies are conducted to report how each part would contribute to the final performance, which could be a good reference for follow-up spatial information representation works." @default.
- W4229027288 created "2022-05-08" @default.
- W4229027288 creator A5019704254 @default.
- W4229027288 creator A5058936239 @default.
- W4229027288 creator A5059407219 @default.
- W4229027288 creator A5065058975 @default.
- W4229027288 creator A5085326472 @default.
- W4229027288 date "2022-08-01" @default.
- W4229027288 modified "2023-10-09" @default.
- W4229027288 title "Position-aware image captioning with spatial relation" @default.
- W4229027288 cites W1861492603 @default.
- W4229027288 cites W1895577753 @default.
- W4229027288 cites W1956340063 @default.
- W4229027288 cites W2101105183 @default.
- W4229027288 cites W2133512280 @default.
- W4229027288 cites W2194775991 @default.
- W4229027288 cites W2277195237 @default.
- W4229027288 cites W2481240925 @default.
- W4229027288 cites W2506483933 @default.
- W4229027288 cites W2508429489 @default.
- W4229027288 cites W2552002300 @default.
- W4229027288 cites W2575842049 @default.
- W4229027288 cites W2745461083 @default.
- W4229027288 cites W2887585070 @default.
- W4229027288 cites W2890531016 @default.
- W4229027288 cites W2962964995 @default.
- W4229027288 cites W2963062932 @default.
- W4229027288 cites W2963084599 @default.
- W4229027288 cites W2963743213 @default.
- W4229027288 cites W2964018924 @default.
- W4229027288 cites W2964051675 @default.
- W4229027288 cites W2964067226 @default.
- W4229027288 cites W2965848243 @default.
- W4229027288 cites W2966683369 @default.
- W4229027288 cites W2972528742 @default.
- W4229027288 cites W2972897806 @default.
- W4229027288 cites W2997591391 @default.
- W4229027288 cites W3002557610 @default.
- W4229027288 cites W3016211260 @default.
- W4229027288 cites W3035160838 @default.
- W4229027288 cites W3035284526 @default.
- W4229027288 cites W3039115681 @default.
- W4229027288 cites W3101313921 @default.
- W4229027288 cites W639708223 @default.
- W4229027288 doi "https://doi.org/10.1016/j.neucom.2022.05.003" @default.
- W4229027288 hasPublicationYear "2022" @default.
- W4229027288 type Work @default.
- W4229027288 citedByCount "5" @default.
- W4229027288 countsByYear W42290272882022 @default.
- W4229027288 countsByYear W42290272882023 @default.
- W4229027288 crossrefType "journal-article" @default.
- W4229027288 hasAuthorship W4229027288A5019704254 @default.
- W4229027288 hasAuthorship W4229027288A5058936239 @default.
- W4229027288 hasAuthorship W4229027288A5059407219 @default.
- W4229027288 hasAuthorship W4229027288A5065058975 @default.
- W4229027288 hasAuthorship W4229027288A5085326472 @default.
- W4229027288 hasConcept C10138342 @default.
- W4229027288 hasConcept C105795698 @default.
- W4229027288 hasConcept C115961682 @default.
- W4229027288 hasConcept C124101348 @default.
- W4229027288 hasConcept C138885662 @default.
- W4229027288 hasConcept C153180895 @default.
- W4229027288 hasConcept C154945302 @default.
- W4229027288 hasConcept C157657479 @default.
- W4229027288 hasConcept C159620131 @default.
- W4229027288 hasConcept C162324750 @default.
- W4229027288 hasConcept C17744445 @default.
- W4229027288 hasConcept C198082294 @default.
- W4229027288 hasConcept C199539241 @default.
- W4229027288 hasConcept C25343380 @default.
- W4229027288 hasConcept C27511587 @default.
- W4229027288 hasConcept C2776359362 @default.
- W4229027288 hasConcept C2776401178 @default.
- W4229027288 hasConcept C31972630 @default.
- W4229027288 hasConcept C33923547 @default.
- W4229027288 hasConcept C36464697 @default.
- W4229027288 hasConcept C41008148 @default.
- W4229027288 hasConcept C41608201 @default.
- W4229027288 hasConcept C41895202 @default.
- W4229027288 hasConcept C44291984 @default.
- W4229027288 hasConcept C94625758 @default.
- W4229027288 hasConceptScore W4229027288C10138342 @default.
- W4229027288 hasConceptScore W4229027288C105795698 @default.
- W4229027288 hasConceptScore W4229027288C115961682 @default.
- W4229027288 hasConceptScore W4229027288C124101348 @default.
- W4229027288 hasConceptScore W4229027288C138885662 @default.
- W4229027288 hasConceptScore W4229027288C153180895 @default.
- W4229027288 hasConceptScore W4229027288C154945302 @default.
- W4229027288 hasConceptScore W4229027288C157657479 @default.
- W4229027288 hasConceptScore W4229027288C159620131 @default.
- W4229027288 hasConceptScore W4229027288C162324750 @default.
- W4229027288 hasConceptScore W4229027288C17744445 @default.
- W4229027288 hasConceptScore W4229027288C198082294 @default.
- W4229027288 hasConceptScore W4229027288C199539241 @default.
- W4229027288 hasConceptScore W4229027288C25343380 @default.
- W4229027288 hasConceptScore W4229027288C27511587 @default.
- W4229027288 hasConceptScore W4229027288C2776359362 @default.
- W4229027288 hasConceptScore W4229027288C2776401178 @default.