SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4379259759> ?p ?o ?g. }

Showing items 1 to 71 of 71 with 100 items per page.

W4379259759 abstract "Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e.g., image-text semantic alignment) and image synthesis (e.g., text-to-image generation). On the other hand, fine-tuning pre-trained models with discriminative or generative capabilities such as CLIP and Stable Diffusion on domain-specific datasets has shown to be effective in various tasks by adapting to specific domains. However, few studies have explored the possibility of learning both discriminative and generative capabilities and leveraging their synergistic effects to create a powerful and personalized multimodal model during fine-tuning. This paper presents UniDiff, a unified multi-modal model that integrates image-text contrastive learning (ITC), text-conditioned image synthesis learning (IS), and reciprocal semantic consistency modeling (RSC). UniDiff effectively learns aligned semantics and mitigates the issue of semantic collapse during fine-tuning on small datasets by leveraging RSC on visual features from CLIP and diffusion models, without altering the pre-trained model's basic architecture. UniDiff demonstrates versatility in both multi-modal understanding and generative tasks. Experimental results on three datasets (Fashion-man, Fashion-woman, and E-commercial Product) showcase substantial enhancements in vision-language retrieval and text-to-image generation, illustrating the advantages of combining discriminative and generative fine-tuning. The proposed UniDiff model establishes a robust pipeline for personalized modeling and serves as a benchmark for future comparisons in the field." @default.
W4379259759 created "2023-06-04" @default.
W4379259759 creator A5001888307 @default.
W4379259759 creator A5017205177 @default.
W4379259759 creator A5034340826 @default.
W4379259759 creator A5047878798 @default.
W4379259759 creator A5073153719 @default.
W4379259759 creator A5075329194 @default.
W4379259759 creator A5082181196 @default.
W4379259759 date "2023-06-01" @default.
W4379259759 modified "2023-10-14" @default.
W4379259759 title "UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning" @default.
W4379259759 doi "https://doi.org/10.48550/arxiv.2306.00813" @default.
W4379259759 hasPublicationYear "2023" @default.
W4379259759 type Work @default.
W4379259759 citedByCount "0" @default.
W4379259759 crossrefType "posted-content" @default.
W4379259759 hasAuthorship W4379259759A5001888307 @default.
W4379259759 hasAuthorship W4379259759A5017205177 @default.
W4379259759 hasAuthorship W4379259759A5034340826 @default.
W4379259759 hasAuthorship W4379259759A5047878798 @default.
W4379259759 hasAuthorship W4379259759A5073153719 @default.
W4379259759 hasAuthorship W4379259759A5075329194 @default.
W4379259759 hasAuthorship W4379259759A5082181196 @default.
W4379259759 hasBestOaLocation W43792597591 @default.
W4379259759 hasConcept C119857082 @default.
W4379259759 hasConcept C13280743 @default.
W4379259759 hasConcept C154945302 @default.
W4379259759 hasConcept C167966045 @default.
W4379259759 hasConcept C184337299 @default.
W4379259759 hasConcept C185798385 @default.
W4379259759 hasConcept C199360897 @default.
W4379259759 hasConcept C204321447 @default.
W4379259759 hasConcept C205649164 @default.
W4379259759 hasConcept C2776436953 @default.
W4379259759 hasConcept C39890363 @default.
W4379259759 hasConcept C41008148 @default.
W4379259759 hasConcept C43521106 @default.
W4379259759 hasConcept C89600930 @default.
W4379259759 hasConcept C97931131 @default.
W4379259759 hasConceptScore W4379259759C119857082 @default.
W4379259759 hasConceptScore W4379259759C13280743 @default.
W4379259759 hasConceptScore W4379259759C154945302 @default.
W4379259759 hasConceptScore W4379259759C167966045 @default.
W4379259759 hasConceptScore W4379259759C184337299 @default.
W4379259759 hasConceptScore W4379259759C185798385 @default.
W4379259759 hasConceptScore W4379259759C199360897 @default.
W4379259759 hasConceptScore W4379259759C204321447 @default.
W4379259759 hasConceptScore W4379259759C205649164 @default.
W4379259759 hasConceptScore W4379259759C2776436953 @default.
W4379259759 hasConceptScore W4379259759C39890363 @default.
W4379259759 hasConceptScore W4379259759C41008148 @default.
W4379259759 hasConceptScore W4379259759C43521106 @default.
W4379259759 hasConceptScore W4379259759C89600930 @default.
W4379259759 hasConceptScore W4379259759C97931131 @default.
W4379259759 hasLocation W43792597591 @default.
W4379259759 hasOpenAccess W4379259759 @default.
W4379259759 hasPrimaryLocation W43792597591 @default.
W4379259759 hasRelatedWork W1534961803 @default.
W4379259759 hasRelatedWork W2171258889 @default.
W4379259759 hasRelatedWork W2770426046 @default.
W4379259759 hasRelatedWork W2949782066 @default.
W4379259759 hasRelatedWork W2991061294 @default.
W4379259759 hasRelatedWork W2994891734 @default.
W4379259759 hasRelatedWork W3113145805 @default.
W4379259759 hasRelatedWork W4319759418 @default.
W4379259759 hasRelatedWork W4378711654 @default.
W4379259759 hasRelatedWork W2310403681 @default.
W4379259759 isParatext "false" @default.
W4379259759 isRetracted "false" @default.
W4379259759 workType "article" @default.