Matches in SemOpenAlex for { <https://semopenalex.org/work/W3110014757> ?p ?o ?g. }
- W3110014757 endingPage "1" @default.
- W3110014757 startingPage "1" @default.
- W3110014757 abstract "We propose scene graph auto-encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like captions. Intuitively, we humans use the inductive bias to compose collocations and contextual inferences in discourse. For example, when we see the relation a person on a bike, it is natural to replace on with ride and infer a person riding a bike on a road even when the road is not evident. Therefore, exploiting such bias as a language prior is expected to help the conventional encoder-decoder models reason as we humans and generate more descriptive captions. Specifically, we use the scene graph-a directed graph ( G) where an object node is connected by adjective nodes and relationship nodes-to represent the complex structural layout of both image ( I) and sentence ( S). In the language domain, we use SGAE to learn a dictionary set ( D) that helps reconstruct sentences in the S→ GS → D → S auto-encoding pipeline, where D encodes the desired language prior and the decoder learns to caption from such a prior; in the vision-language domain, we share D in the I→ GI → D → S pipeline and distill the knowledge of the language decoder of the auto-encoder to that of the encoder-decoder based image captioner to transfer the language inductive bias. In this way, the shared D provides hidden embeddings about descriptive collocations to the encoder-decoder and the distillation strategy teaches the encoder-decoder to transform these embeddings to human-like captions as the auto-encoder. Thanks to the scene graph representation, the shared dictionary set, and the Knowledge Distillation strategy, the inductive bias is transferred across domains in principle. We validate the effectiveness of SGAE on the challenging MS-COCO image captioning benchmark, where our SGAE-based single-model achieves a new state-of-the-art 129.6 CIDEr-D on the Karpathy split, and a competitive 126.6 CIDEr-D (c40) on the official server, which is even comparable to other ensemble models. Furthermore, we validate the transferability of SGAE on two more challenging settings: transferring inductive bias from other language corpora and unpaired image captioning. Once again, the results of both settings confirm the superiority of SGAE. The code is released in https://github.com/yangxuntu/SGAE." @default.
- W3110014757 created "2020-12-07" @default.
- W3110014757 creator A5017901486 @default.
- W3110014757 creator A5042324027 @default.
- W3110014757 creator A5064752243 @default.
- W3110014757 date "2020-01-01" @default.
- W3110014757 modified "2023-10-05" @default.
- W3110014757 title "Auto-encoding and Distilling Scene Graphs for Image Captioning" @default.
- W3110014757 cites W1895577753 @default.
- W3110014757 cites W1905882502 @default.
- W3110014757 cites W1931639407 @default.
- W3110014757 cites W1956340063 @default.
- W3110014757 cites W2077069816 @default.
- W3110014757 cites W2097606805 @default.
- W3110014757 cites W2139906443 @default.
- W3110014757 cites W2194775991 @default.
- W3110014757 cites W2250378130 @default.
- W3110014757 cites W2277195237 @default.
- W3110014757 cites W2302086703 @default.
- W3110014757 cites W2473930607 @default.
- W3110014757 cites W2506483933 @default.
- W3110014757 cites W2552161745 @default.
- W3110014757 cites W2558834163 @default.
- W3110014757 cites W2561715562 @default.
- W3110014757 cites W2570343428 @default.
- W3110014757 cites W2575842049 @default.
- W3110014757 cites W2579549467 @default.
- W3110014757 cites W2591644541 @default.
- W3110014757 cites W2600702321 @default.
- W3110014757 cites W2745461083 @default.
- W3110014757 cites W2795151422 @default.
- W3110014757 cites W2798441115 @default.
- W3110014757 cites W2885013662 @default.
- W3110014757 cites W2886970679 @default.
- W3110014757 cites W2887029921 @default.
- W3110014757 cites W2887585070 @default.
- W3110014757 cites W2890531016 @default.
- W3110014757 cites W2938603906 @default.
- W3110014757 cites W2962779575 @default.
- W3110014757 cites W2963037989 @default.
- W3110014757 cites W2963048642 @default.
- W3110014757 cites W2963084599 @default.
- W3110014757 cites W2963101956 @default.
- W3110014757 cites W2963150697 @default.
- W3110014757 cites W2963184176 @default.
- W3110014757 cites W2963305465 @default.
- W3110014757 cites W2963448089 @default.
- W3110014757 cites W2963448850 @default.
- W3110014757 cites W2963536419 @default.
- W3110014757 cites W2963656855 @default.
- W3110014757 cites W2963743213 @default.
- W3110014757 cites W2963762755 @default.
- W3110014757 cites W2963921921 @default.
- W3110014757 cites W2963938081 @default.
- W3110014757 cites W2964082701 @default.
- W3110014757 cites W2964189064 @default.
- W3110014757 cites W2965697393 @default.
- W3110014757 cites W2965833116 @default.
- W3110014757 cites W2969679616 @default.
- W3110014757 cites W2984121207 @default.
- W3110014757 cites W2987123286 @default.
- W3110014757 cites W2990069284 @default.
- W3110014757 cites W2990307191 @default.
- W3110014757 cites W2992478697 @default.
- W3110014757 cites W2999219213 @default.
- W3110014757 doi "https://doi.org/10.1109/tpami.2020.3042192" @default.
- W3110014757 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/33270557" @default.
- W3110014757 hasPublicationYear "2020" @default.
- W3110014757 type Work @default.
- W3110014757 sameAs 3110014757 @default.
- W3110014757 citedByCount "17" @default.
- W3110014757 countsByYear W31100147572021 @default.
- W3110014757 countsByYear W31100147572022 @default.
- W3110014757 countsByYear W31100147572023 @default.
- W3110014757 crossrefType "journal-article" @default.
- W3110014757 hasAuthorship W3110014757A5017901486 @default.
- W3110014757 hasAuthorship W3110014757A5042324027 @default.
- W3110014757 hasAuthorship W3110014757A5064752243 @default.
- W3110014757 hasConcept C111919701 @default.
- W3110014757 hasConcept C11413529 @default.
- W3110014757 hasConcept C115961682 @default.
- W3110014757 hasConcept C118505674 @default.
- W3110014757 hasConcept C125411270 @default.
- W3110014757 hasConcept C132525143 @default.
- W3110014757 hasConcept C137293760 @default.
- W3110014757 hasConcept C154945302 @default.
- W3110014757 hasConcept C157657479 @default.
- W3110014757 hasConcept C159246509 @default.
- W3110014757 hasConcept C162324750 @default.
- W3110014757 hasConcept C179372163 @default.
- W3110014757 hasConcept C187736073 @default.
- W3110014757 hasConcept C195324797 @default.
- W3110014757 hasConcept C197352929 @default.
- W3110014757 hasConcept C199360897 @default.
- W3110014757 hasConcept C204321447 @default.
- W3110014757 hasConcept C205711294 @default.
- W3110014757 hasConcept C2777530160 @default.
- W3110014757 hasConcept C2780451532 @default.