Matches in SemOpenAlex for { <https://semopenalex.org/work/W4292263758> ?p ?o ?g. }
- W4292263758 endingPage "268" @default.
- W4292263758 startingPage "257" @default.
- W4292263758 abstract "Visual storytelling aims at producing a narrative paragraph for a given photo album automatically. It introduces more new challenges than individual image paragraph descriptions, mainly due to the difficulty in preserving coherent topics and in generating diverse phrases to depict the rich content of a photo album. Existing attention-based models that lack higher-level guiding information always result in a deviation between the generated sentence and the topic expressed by the image. In addition, these widely applied language generation approaches employing standard beam search tend to produce monotonous descriptions. In this work, a coherent visual storytelling (CoVS) framework is designed to address the above-mentioned problems. Specifically, in the encoding phase, an image sequence encoder is designed to efficiently extract visual features of the input photo album. Then, the novel parallel top-down visual and topic attention (PTDVTA) decoder is constructed via a topic-aware neural network, a parallel top-down attention model, and a coherent language generator. Concretely, visual attention focuses on the attributes and the relationships of the objects, while topic attention integrating a topic-aware neural network could improve the coherence of generated sentences. Eventually, a phrase beam search algorithm with <inline-formula xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink> <tex-math notation=LaTeX>$n$ </tex-math></inline-formula> -gram hamming diversity is further designed to optimize the expression diversity of the generated story. To justify the proposed CoVS framework, extensive experiments are conducted on the VIST dataset, which shows that CoVS can automatically generate coherent and diverse stories in a more natural way. Moreover, CoVS obtains better performance than state-of-the-art baselines on BLEU-4 and METEOR scores, while maintaining good CIDEr and ROUGH_L scores. The source code of this work can be found in <uri xmlns:mml=http://www.w3.org/1998/Math/MathML xmlns:xlink=http://www.w3.org/1999/xlink>https://mic.tongji.edu.cn</uri> ." @default.
- W4292263758 created "2022-08-19" @default.
- W4292263758 creator A5057852825 @default.
- W4292263758 creator A5058982350 @default.
- W4292263758 creator A5061965819 @default.
- W4292263758 date "2023-01-01" @default.
- W4292263758 modified "2023-10-13" @default.
- W4292263758 title "Coherent Visual Storytelling via Parallel Top-Down Visual and Topic Attention" @default.
- W4292263758 cites W1956340063 @default.
- W4292263758 cites W2001082470 @default.
- W4292263758 cites W2108325777 @default.
- W4292263758 cites W2250533720 @default.
- W4292263758 cites W2302086703 @default.
- W4292263758 cites W2552161745 @default.
- W4292263758 cites W2558834163 @default.
- W4292263758 cites W2575842049 @default.
- W4292263758 cites W2584271190 @default.
- W4292263758 cites W2605045867 @default.
- W4292263758 cites W2625940279 @default.
- W4292263758 cites W2741196623 @default.
- W4292263758 cites W2745461083 @default.
- W4292263758 cites W2766711654 @default.
- W4292263758 cites W2795151422 @default.
- W4292263758 cites W2808663243 @default.
- W4292263758 cites W2883891001 @default.
- W4292263758 cites W2888321701 @default.
- W4292263758 cites W2890531016 @default.
- W4292263758 cites W2953486038 @default.
- W4292263758 cites W2963033554 @default.
- W4292263758 cites W2963187786 @default.
- W4292263758 cites W2963668753 @default.
- W4292263758 cites W2963811641 @default.
- W4292263758 cites W2963829244 @default.
- W4292263758 cites W2964546107 @default.
- W4292263758 cites W2965112138 @default.
- W4292263758 cites W2966774251 @default.
- W4292263758 cites W2987862245 @default.
- W4292263758 cites W2996941455 @default.
- W4292263758 cites W2998106530 @default.
- W4292263758 cites W2998303222 @default.
- W4292263758 cites W2998559045 @default.
- W4292263758 cites W2998813101 @default.
- W4292263758 cites W3036848992 @default.
- W4292263758 cites W3082226086 @default.
- W4292263758 cites W3087871082 @default.
- W4292263758 cites W3094176136 @default.
- W4292263758 cites W3113632293 @default.
- W4292263758 cites W3128339783 @default.
- W4292263758 cites W3131251978 @default.
- W4292263758 cites W3132408010 @default.
- W4292263758 cites W3134875898 @default.
- W4292263758 cites W3136792391 @default.
- W4292263758 cites W3162694035 @default.
- W4292263758 cites W3173770572 @default.
- W4292263758 cites W3176313262 @default.
- W4292263758 cites W3176565349 @default.
- W4292263758 cites W4285125423 @default.
- W4292263758 cites W4288083805 @default.
- W4292263758 cites W3106477714 @default.
- W4292263758 doi "https://doi.org/10.1109/tcsvt.2022.3199603" @default.
- W4292263758 hasPublicationYear "2023" @default.
- W4292263758 type Work @default.
- W4292263758 citedByCount "0" @default.
- W4292263758 crossrefType "journal-article" @default.
- W4292263758 hasAuthorship W4292263758A5057852825 @default.
- W4292263758 hasAuthorship W4292263758A5058982350 @default.
- W4292263758 hasAuthorship W4292263758A5061965819 @default.
- W4292263758 hasConcept C115961682 @default.
- W4292263758 hasConcept C121332964 @default.
- W4292263758 hasConcept C136764020 @default.
- W4292263758 hasConcept C138885662 @default.
- W4292263758 hasConcept C154945302 @default.
- W4292263758 hasConcept C157657479 @default.
- W4292263758 hasConcept C158495155 @default.
- W4292263758 hasConcept C163258240 @default.
- W4292263758 hasConcept C199033989 @default.
- W4292263758 hasConcept C204321447 @default.
- W4292263758 hasConcept C23123220 @default.
- W4292263758 hasConcept C2776224158 @default.
- W4292263758 hasConcept C2776538412 @default.
- W4292263758 hasConcept C2777206241 @default.
- W4292263758 hasConcept C2777530160 @default.
- W4292263758 hasConcept C2780992000 @default.
- W4292263758 hasConcept C41008148 @default.
- W4292263758 hasConcept C41895202 @default.
- W4292263758 hasConcept C62520636 @default.
- W4292263758 hasConceptScore W4292263758C115961682 @default.
- W4292263758 hasConceptScore W4292263758C121332964 @default.
- W4292263758 hasConceptScore W4292263758C136764020 @default.
- W4292263758 hasConceptScore W4292263758C138885662 @default.
- W4292263758 hasConceptScore W4292263758C154945302 @default.
- W4292263758 hasConceptScore W4292263758C157657479 @default.
- W4292263758 hasConceptScore W4292263758C158495155 @default.
- W4292263758 hasConceptScore W4292263758C163258240 @default.
- W4292263758 hasConceptScore W4292263758C199033989 @default.
- W4292263758 hasConceptScore W4292263758C204321447 @default.
- W4292263758 hasConceptScore W4292263758C23123220 @default.
- W4292263758 hasConceptScore W4292263758C2776224158 @default.