SemOpenAlex |

SemOpenAlex

Matches in SemOpenAlex for { <https://semopenalex.org/work/W4382457317> ?p ?o ?g. }

Showing items 1 to 63 of 63 with 100 items per page.

W4382457317 endingPage "934" @default.
W4382457317 startingPage "926" @default.
W4382457317 abstract "Large-scale cross-modal pre-training paradigms have recently shown ubiquitous success on a wide range of downstream tasks, e.g., zero-shot classification, retrieval and image captioning. However, their successes highly rely on the scale and quality of web-crawled data that naturally contain much incomplete and noisy information (e.g., wrong or irrelevant contents). Existing works either design manual rules to clean data or generate pseudo-targets as auxiliary signals for reducing noise impact, which do not explicitly tackle both the incorrect and incomplete challenges at the same time. In this paper, to automatically mitigate the impact of noise by solely mining over existing data, we propose a principled Noise-robust Language-Image Pre-training framework (NLIP) to stabilize pre-training via two schemes: noise-harmonization and noise-completion. First, in noise-harmonization scheme, NLIP estimates the noise probability of each pair according to the memorization effect of cross-modal transformers, then adopts noise-adaptive regularization to harmonize the cross-modal alignments with varying degrees. Second, in noise-completion scheme, to enrich the missing object information of text, NLIP injects a concept-conditioned cross-modal decoder to obtain semantic-consistent synthetic captions to complete noisy ones, which uses the retrieved visual concepts (i.e., objects’ names) for the corresponding image to guide captioning generation. By collaboratively optimizing noise-harmonization and noise-completion schemes, our NLIP can alleviate the common noise effects during image-text pre-training in a more efficient way. Extensive experiments show the significant performance improvements of our NLIP using only 26M data over existing pre-trained models (e.g., CLIP, FILIP and BLIP) on 12 zero-shot classification datasets (e.g., +8.6% over CLIP on average accuracy), MSCOCO image captioning (e.g., +1.9 over BLIP trained with 129M data on CIDEr) and zero-shot image-text retrieval tasks." @default.
W4382457317 created "2023-06-29" @default.
W4382457317 creator A5001888307 @default.
W4382457317 creator A5018425393 @default.
W4382457317 creator A5028785469 @default.
W4382457317 creator A5041457457 @default.
W4382457317 creator A5047878798 @default.
W4382457317 creator A5066602150 @default.
W4382457317 creator A5075195203 @default.
W4382457317 date "2023-06-26" @default.
W4382457317 modified "2023-10-16" @default.
W4382457317 title "NLIP: Noise-Robust Language-Image Pre-training" @default.
W4382457317 doi "https://doi.org/10.1609/aaai.v37i1.25172" @default.
W4382457317 hasPublicationYear "2023" @default.
W4382457317 type Work @default.
W4382457317 citedByCount "0" @default.
W4382457317 crossrefType "journal-article" @default.
W4382457317 hasAuthorship W4382457317A5001888307 @default.
W4382457317 hasAuthorship W4382457317A5018425393 @default.
W4382457317 hasAuthorship W4382457317A5028785469 @default.
W4382457317 hasAuthorship W4382457317A5041457457 @default.
W4382457317 hasAuthorship W4382457317A5047878798 @default.
W4382457317 hasAuthorship W4382457317A5066602150 @default.
W4382457317 hasAuthorship W4382457317A5075195203 @default.
W4382457317 hasBestOaLocation W43824573171 @default.
W4382457317 hasConcept C115961682 @default.
W4382457317 hasConcept C124101348 @default.
W4382457317 hasConcept C154945302 @default.
W4382457317 hasConcept C157657479 @default.
W4382457317 hasConcept C163294075 @default.
W4382457317 hasConcept C29265498 @default.
W4382457317 hasConcept C35772409 @default.
W4382457317 hasConcept C41008148 @default.
W4382457317 hasConcept C99498987 @default.
W4382457317 hasConceptScore W4382457317C115961682 @default.
W4382457317 hasConceptScore W4382457317C124101348 @default.
W4382457317 hasConceptScore W4382457317C154945302 @default.
W4382457317 hasConceptScore W4382457317C157657479 @default.
W4382457317 hasConceptScore W4382457317C163294075 @default.
W4382457317 hasConceptScore W4382457317C29265498 @default.
W4382457317 hasConceptScore W4382457317C35772409 @default.
W4382457317 hasConceptScore W4382457317C41008148 @default.
W4382457317 hasConceptScore W4382457317C99498987 @default.
W4382457317 hasIssue "1" @default.
W4382457317 hasLocation W43824573171 @default.
W4382457317 hasOpenAccess W4382457317 @default.
W4382457317 hasPrimaryLocation W43824573171 @default.
W4382457317 hasRelatedWork W1495420449 @default.
W4382457317 hasRelatedWork W1972186641 @default.
W4382457317 hasRelatedWork W2090859501 @default.
W4382457317 hasRelatedWork W2137559967 @default.
W4382457317 hasRelatedWork W2143126250 @default.
W4382457317 hasRelatedWork W2512846683 @default.
W4382457317 hasRelatedWork W2923366293 @default.
W4382457317 hasRelatedWork W3008515501 @default.
W4382457317 hasRelatedWork W3190667789 @default.
W4382457317 hasRelatedWork W4379537582 @default.
W4382457317 hasVolume "37" @default.
W4382457317 isParatext "false" @default.
W4382457317 isRetracted "false" @default.
W4382457317 workType "article" @default.