Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386043579> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W4386043579 abstract "Relational Language-Image Pre-training (RLIP) aims to align vision representations with relational texts, thereby advancing the capability of relational reasoning in computer vision tasks. However, hindered by the slow convergence of RLIPv1 architecture and the limited availability of existing scene graph data, scaling RLIPv1 is challenging. In this paper, we propose RLIPv2, a fast converging model that enables the scaling of relational pre-training to large-scale pseudo-labelled scene graph data. To enable fast scaling, RLIPv2 introduces Asymmetric Language-Image Fusion (ALIF), a mechanism that facilitates earlier and deeper gated cross-modal fusion with sparsified language encoding layers. ALIF leads to comparable or better performance than RLIPv1 in a fraction of the time for pre-training and fine-tuning. To obtain scene graph data at scale, we extend object detection datasets with free-form relation labels by introducing a captioner (e.g., BLIP) and a designed Relation Tagger. The Relation Tagger assigns BLIP-generated relation texts to region pairs, thus enabling larger-scale relational pre-training. Through extensive experiments conducted on Human-Object Interaction Detection and Scene Graph Generation, RLIPv2 shows state-of-the-art performance on three benchmarks under fully-finetuning, few-shot and zero-shot settings. Notably, the largest RLIPv2 achieves 23.29mAP on HICO-DET without any fine-tuning, yields 32.22mAP with just 1% data and yields 45.09mAP with 100% data. Code and models are publicly available at https://github.com/JacobYuan7/RLIPv2." @default.
- W4386043579 created "2023-08-22" @default.
- W4386043579 creator A5015480866 @default.
- W4386043579 creator A5039538233 @default.
- W4386043579 creator A5053261798 @default.
- W4386043579 creator A5061252274 @default.
- W4386043579 creator A5065374358 @default.
- W4386043579 creator A5072476367 @default.
- W4386043579 creator A5074916544 @default.
- W4386043579 creator A5080934752 @default.
- W4386043579 creator A5082388713 @default.
- W4386043579 creator A5090528235 @default.
- W4386043579 date "2023-08-18" @default.
- W4386043579 modified "2023-09-26" @default.
- W4386043579 title "RLIPv2: Fast Scaling of Relational Language-Image Pre-training" @default.
- W4386043579 doi "https://doi.org/10.48550/arxiv.2308.09351" @default.
- W4386043579 hasPublicationYear "2023" @default.
- W4386043579 type Work @default.
- W4386043579 citedByCount "0" @default.
- W4386043579 crossrefType "posted-content" @default.
- W4386043579 hasAuthorship W4386043579A5015480866 @default.
- W4386043579 hasAuthorship W4386043579A5039538233 @default.
- W4386043579 hasAuthorship W4386043579A5053261798 @default.
- W4386043579 hasAuthorship W4386043579A5061252274 @default.
- W4386043579 hasAuthorship W4386043579A5065374358 @default.
- W4386043579 hasAuthorship W4386043579A5072476367 @default.
- W4386043579 hasAuthorship W4386043579A5074916544 @default.
- W4386043579 hasAuthorship W4386043579A5080934752 @default.
- W4386043579 hasAuthorship W4386043579A5082388713 @default.
- W4386043579 hasAuthorship W4386043579A5090528235 @default.
- W4386043579 hasBestOaLocation W43860435791 @default.
- W4386043579 hasConcept C115961682 @default.
- W4386043579 hasConcept C124101348 @default.
- W4386043579 hasConcept C132525143 @default.
- W4386043579 hasConcept C154945302 @default.
- W4386043579 hasConcept C160633673 @default.
- W4386043579 hasConcept C177264268 @default.
- W4386043579 hasConcept C179372163 @default.
- W4386043579 hasConcept C199360897 @default.
- W4386043579 hasConcept C204321447 @default.
- W4386043579 hasConcept C205711294 @default.
- W4386043579 hasConcept C2524010 @default.
- W4386043579 hasConcept C25343380 @default.
- W4386043579 hasConcept C2776760102 @default.
- W4386043579 hasConcept C33923547 @default.
- W4386043579 hasConcept C41008148 @default.
- W4386043579 hasConcept C5655090 @default.
- W4386043579 hasConcept C80444323 @default.
- W4386043579 hasConcept C99844830 @default.
- W4386043579 hasConceptScore W4386043579C115961682 @default.
- W4386043579 hasConceptScore W4386043579C124101348 @default.
- W4386043579 hasConceptScore W4386043579C132525143 @default.
- W4386043579 hasConceptScore W4386043579C154945302 @default.
- W4386043579 hasConceptScore W4386043579C160633673 @default.
- W4386043579 hasConceptScore W4386043579C177264268 @default.
- W4386043579 hasConceptScore W4386043579C179372163 @default.
- W4386043579 hasConceptScore W4386043579C199360897 @default.
- W4386043579 hasConceptScore W4386043579C204321447 @default.
- W4386043579 hasConceptScore W4386043579C205711294 @default.
- W4386043579 hasConceptScore W4386043579C2524010 @default.
- W4386043579 hasConceptScore W4386043579C25343380 @default.
- W4386043579 hasConceptScore W4386043579C2776760102 @default.
- W4386043579 hasConceptScore W4386043579C33923547 @default.
- W4386043579 hasConceptScore W4386043579C41008148 @default.
- W4386043579 hasConceptScore W4386043579C5655090 @default.
- W4386043579 hasConceptScore W4386043579C80444323 @default.
- W4386043579 hasConceptScore W4386043579C99844830 @default.
- W4386043579 hasLocation W43860435791 @default.
- W4386043579 hasOpenAccess W4386043579 @default.
- W4386043579 hasPrimaryLocation W43860435791 @default.
- W4386043579 hasRelatedWork W2090093270 @default.
- W4386043579 hasRelatedWork W2114889067 @default.
- W4386043579 hasRelatedWork W2293457016 @default.
- W4386043579 hasRelatedWork W2373724792 @default.
- W4386043579 hasRelatedWork W2719428933 @default.
- W4386043579 hasRelatedWork W2789919619 @default.
- W4386043579 hasRelatedWork W3115043162 @default.
- W4386043579 hasRelatedWork W4313889511 @default.
- W4386043579 hasRelatedWork W1551406738 @default.
- W4386043579 hasRelatedWork W2610387714 @default.
- W4386043579 isParatext "false" @default.
- W4386043579 isRetracted "false" @default.
- W4386043579 workType "article" @default.