Matches in SemOpenAlex for { <https://semopenalex.org/work/W4313172891> ?p ?o ?g. }
- W4313172891 endingPage "302" @default.
- W4313172891 startingPage "284" @default.
- W4313172891 abstract "Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited various vision-language tasks by jointly learning visual and textual representations, which intuitively helps in Optical Character Recognition (OCR) tasks due to the rich visual and textual information in scene text images. However, these methods cannot well cope with OCR tasks because of the difficulty in both instance-level text encoding and image-text pair acquisition (i.e. images and captured texts in them). This paper presents a weakly supervised pre-training method, oCLIP, which can acquire effective scene text representations by jointly learning and aligning visual and textual information. Our network consists of an image encoder and a character-aware text encoder that extract visual and textual features, respectively, as well as a visual-textual decoder that models the interaction among textual and visual features for learning effective scene text representations. With the learning of textual features, the pre-trained model can attend texts in images well with character awareness. Besides, these designs enable the learning from weakly annotated texts (i.e. partial texts in images without text bounding boxes) which mitigates the data annotation constraint greatly. Experiments over the weakly annotated images in ICDAR2019-LSVT show that our pre-trained model improves F-score by +2.5% and +4.8% while transferring its weights to other text detection and spotting networks, respectively. In addition, the proposed method outperforms existing pre-training techniques consistently across multiple public datasets (e.g., +3.2% and +1.3% for Total-Text and CTW1500)." @default.
- W4313172891 created "2023-01-06" @default.
- W4313172891 creator A5001534492 @default.
- W4313172891 creator A5013233914 @default.
- W4313172891 creator A5014726262 @default.
- W4313172891 creator A5023507910 @default.
- W4313172891 creator A5024075471 @default.
- W4313172891 creator A5068548252 @default.
- W4313172891 date "2022-01-01" @default.
- W4313172891 modified "2023-10-16" @default.
- W4313172891 title "Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting" @default.
- W4313172891 cites W2144554289 @default.
- W4313172891 cites W2194775991 @default.
- W4313172891 cites W2343052201 @default.
- W4313172891 cites W2605076167 @default.
- W4313172891 cites W2605982830 @default.
- W4313172891 cites W2784050770 @default.
- W4313172891 cites W2798450692 @default.
- W4313172891 cites W2810028092 @default.
- W4313172891 cites W2831607544 @default.
- W4313172891 cites W2873558679 @default.
- W4313172891 cites W2875814315 @default.
- W4313172891 cites W2902494497 @default.
- W4313172891 cites W2953606406 @default.
- W4313172891 cites W2962935569 @default.
- W4313172891 cites W2962986948 @default.
- W4313172891 cites W2963150697 @default.
- W4313172891 cites W2963299604 @default.
- W4313172891 cites W2963353821 @default.
- W4313172891 cites W2963647456 @default.
- W4313172891 cites W2964018263 @default.
- W4313172891 cites W2964294787 @default.
- W4313172891 cites W2964296749 @default.
- W4313172891 cites W2964685115 @default.
- W4313172891 cites W2967615747 @default.
- W4313172891 cites W2968226676 @default.
- W4313172891 cites W2970231061 @default.
- W4313172891 cites W2970910956 @default.
- W4313172891 cites W2983626510 @default.
- W4313172891 cites W2987563462 @default.
- W4313172891 cites W2988098900 @default.
- W4313172891 cites W2991626090 @default.
- W4313172891 cites W2997371611 @default.
- W4313172891 cites W2998356391 @default.
- W4313172891 cites W2998621280 @default.
- W4313172891 cites W3002942143 @default.
- W4313172891 cites W3003218881 @default.
- W4313172891 cites W3003990305 @default.
- W4313172891 cites W3005400651 @default.
- W4313172891 cites W3034500398 @default.
- W4313172891 cites W3034514377 @default.
- W4313172891 cites W3034792612 @default.
- W4313172891 cites W3035449864 @default.
- W4313172891 cites W3035679705 @default.
- W4313172891 cites W3090449556 @default.
- W4313172891 cites W3092619320 @default.
- W4313172891 cites W3093124244 @default.
- W4313172891 cites W3097932944 @default.
- W4313172891 cites W3099143471 @default.
- W4313172891 cites W3109097593 @default.
- W4313172891 cites W3110398855 @default.
- W4313172891 cites W3111172959 @default.
- W4313172891 cites W3116651605 @default.
- W4313172891 cites W3139822213 @default.
- W4313172891 cites W3159307593 @default.
- W4313172891 cites W3171030392 @default.
- W4313172891 cites W3172799005 @default.
- W4313172891 cites W3177684257 @default.
- W4313172891 cites W3181016597 @default.
- W4313172891 cites W3184364189 @default.
- W4313172891 cites W3186906052 @default.
- W4313172891 cites W3196976036 @default.
- W4313172891 cites W4312351507 @default.
- W4313172891 doi "https://doi.org/10.1007/978-3-031-19815-1_17" @default.
- W4313172891 hasPublicationYear "2022" @default.
- W4313172891 type Work @default.
- W4313172891 citedByCount "3" @default.
- W4313172891 countsByYear W43131728912023 @default.
- W4313172891 crossrefType "book-chapter" @default.
- W4313172891 hasAuthorship W4313172891A5001534492 @default.
- W4313172891 hasAuthorship W4313172891A5013233914 @default.
- W4313172891 hasAuthorship W4313172891A5014726262 @default.
- W4313172891 hasAuthorship W4313172891A5023507910 @default.
- W4313172891 hasAuthorship W4313172891A5024075471 @default.
- W4313172891 hasAuthorship W4313172891A5068548252 @default.
- W4313172891 hasBestOaLocation W43131728912 @default.
- W4313172891 hasConcept C115961682 @default.
- W4313172891 hasConcept C127413603 @default.
- W4313172891 hasConcept C153180895 @default.
- W4313172891 hasConcept C154945302 @default.
- W4313172891 hasConcept C204321447 @default.
- W4313172891 hasConcept C2524010 @default.
- W4313172891 hasConcept C2776036281 @default.
- W4313172891 hasConcept C2776321320 @default.
- W4313172891 hasConcept C2779506182 @default.
- W4313172891 hasConcept C2780861071 @default.
- W4313172891 hasConcept C2781213101 @default.
- W4313172891 hasConcept C28490314 @default.