Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386072307> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4386072307 abstract "Recent advances on text-to-image generation have witnessed the rise of diffusion models which act as powerful generative models. Nevertheless, it is not trivial to exploit such latent variable models to capture the dependency among discrete words and meanwhile pursue complex visual-language alignment in image captioning. In this paper, we break the deeply rooted conventions in learning Transformer-based encoder-decoder, and propose a new diffusion model based paradigm tailored for image captioning, namely Semantic-Conditional Diffusion Networks (SCD-Net). Technically, for each input image, we first search the semantically relevant sentences via cross-modal retrieval model to convey the comprehensive semantic information. The rich semantics are further regarded as semantic prior to trigger the learning of Diffusion Transformer, which produces the output sentence in a diffusion process. In SCD-Net, multiple Diffusion Transformer structures are stacked to progressively strengthen the output sentence with better visional-language alignment and linguistical coherence in a cascaded manner. Furthermore, to stabilize the diffusion process, a new self-critical sequence training strategy is designed to guide the learning of SCD-Net with the knowledge of a standard autoregressive Transformer model. Extensive experiments on COCO dataset demonstrate the promising potential of using diffusion models in the challenging image captioning task. Source code is available at" @default.
- W4386072307 created "2023-08-23" @default.
- W4386072307 creator A5017597537 @default.
- W4386072307 creator A5022947394 @default.
- W4386072307 creator A5041154840 @default.
- W4386072307 creator A5049761910 @default.
- W4386072307 creator A5061525421 @default.
- W4386072307 creator A5085403640 @default.
- W4386072307 creator A5088760097 @default.
- W4386072307 date "2023-06-01" @default.
- W4386072307 modified "2023-09-27" @default.
- W4386072307 title "Semantic-Conditional Diffusion Networks for Image Captioning*" @default.
- W4386072307 cites W1895577753 @default.
- W4386072307 cites W1905882502 @default.
- W4386072307 cites W1947481528 @default.
- W4386072307 cites W1956340063 @default.
- W4386072307 cites W2108598243 @default.
- W4386072307 cites W2277195237 @default.
- W4386072307 cites W2302086703 @default.
- W4386072307 cites W2745461083 @default.
- W4386072307 cites W2885013662 @default.
- W4386072307 cites W2886641317 @default.
- W4386072307 cites W2887585070 @default.
- W4386072307 cites W2963084599 @default.
- W4386072307 cites W2963101956 @default.
- W4386072307 cites W2965697393 @default.
- W4386072307 cites W2986670728 @default.
- W4386072307 cites W3034655362 @default.
- W4386072307 cites W3196122027 @default.
- W4386072307 cites W3205981128 @default.
- W4386072307 cites W3210150990 @default.
- W4386072307 cites W4285345750 @default.
- W4386072307 cites W4313131769 @default.
- W4386072307 doi "https://doi.org/10.1109/cvpr52729.2023.02237" @default.
- W4386072307 hasPublicationYear "2023" @default.
- W4386072307 type Work @default.
- W4386072307 citedByCount "0" @default.
- W4386072307 crossrefType "proceedings-article" @default.
- W4386072307 hasAuthorship W4386072307A5017597537 @default.
- W4386072307 hasAuthorship W4386072307A5022947394 @default.
- W4386072307 hasAuthorship W4386072307A5041154840 @default.
- W4386072307 hasAuthorship W4386072307A5049761910 @default.
- W4386072307 hasAuthorship W4386072307A5061525421 @default.
- W4386072307 hasAuthorship W4386072307A5085403640 @default.
- W4386072307 hasAuthorship W4386072307A5088760097 @default.
- W4386072307 hasConcept C111919701 @default.
- W4386072307 hasConcept C115961682 @default.
- W4386072307 hasConcept C118505674 @default.
- W4386072307 hasConcept C121332964 @default.
- W4386072307 hasConcept C137293760 @default.
- W4386072307 hasConcept C154945302 @default.
- W4386072307 hasConcept C157657479 @default.
- W4386072307 hasConcept C165801399 @default.
- W4386072307 hasConcept C204321447 @default.
- W4386072307 hasConcept C2777530160 @default.
- W4386072307 hasConcept C41008148 @default.
- W4386072307 hasConcept C62520636 @default.
- W4386072307 hasConcept C66322947 @default.
- W4386072307 hasConceptScore W4386072307C111919701 @default.
- W4386072307 hasConceptScore W4386072307C115961682 @default.
- W4386072307 hasConceptScore W4386072307C118505674 @default.
- W4386072307 hasConceptScore W4386072307C121332964 @default.
- W4386072307 hasConceptScore W4386072307C137293760 @default.
- W4386072307 hasConceptScore W4386072307C154945302 @default.
- W4386072307 hasConceptScore W4386072307C157657479 @default.
- W4386072307 hasConceptScore W4386072307C165801399 @default.
- W4386072307 hasConceptScore W4386072307C204321447 @default.
- W4386072307 hasConceptScore W4386072307C2777530160 @default.
- W4386072307 hasConceptScore W4386072307C41008148 @default.
- W4386072307 hasConceptScore W4386072307C62520636 @default.
- W4386072307 hasConceptScore W4386072307C66322947 @default.
- W4386072307 hasLocation W43860723071 @default.
- W4386072307 hasOpenAccess W4386072307 @default.
- W4386072307 hasPrimaryLocation W43860723071 @default.
- W4386072307 hasRelatedWork W2547835662 @default.
- W4386072307 hasRelatedWork W3025136821 @default.
- W4386072307 hasRelatedWork W3035237998 @default.
- W4386072307 hasRelatedWork W4224046780 @default.
- W4386072307 hasRelatedWork W4281560470 @default.
- W4386072307 hasRelatedWork W4312545247 @default.
- W4386072307 hasRelatedWork W4312845724 @default.
- W4386072307 hasRelatedWork W4364297074 @default.
- W4386072307 hasRelatedWork W4384210086 @default.
- W4386072307 hasRelatedWork W4386076661 @default.
- W4386072307 isParatext "false" @default.
- W4386072307 isRetracted "false" @default.
- W4386072307 workType "article" @default.