Matches in SemOpenAlex for { <https://semopenalex.org/work/W3092474243> ?p ?o ?g. }
- W3092474243 abstract "Multimodal automatic speech recognition systems integrate information from images to improve speech recognition quality, by grounding the speech in the visual context. While visual signals have been shown to be useful for recovering entities that have been masked in the audio, these models should be capable of recovering a broader range of word types. Existing systems rely on global visual features that represent the entire image, but localizing the relevant regions of the image will make it possible to recover a larger set of words, such as adjectives and verbs. In this paper, we propose a model that uses finer-grained visual information from different parts of the image, using automatic object proposals. In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals." @default.
- W3092474243 created "2020-10-15" @default.
- W3092474243 creator A5010165733 @default.
- W3092474243 creator A5053704856 @default.
- W3092474243 creator A5082028315 @default.
- W3092474243 creator A5085262529 @default.
- W3092474243 date "2020-10-05" @default.
- W3092474243 modified "2023-09-27" @default.
- W3092474243 title "Fine-Grained Grounding for Multimodal Speech Recognition" @default.
- W3092474243 cites W1514535095 @default.
- W3092474243 cites W1522301498 @default.
- W3092474243 cites W1895577753 @default.
- W3092474243 cites W1933349210 @default.
- W3092474243 cites W1986533012 @default.
- W3092474243 cites W2043701535 @default.
- W3092474243 cites W2064675550 @default.
- W3092474243 cites W2117539524 @default.
- W3092474243 cites W2131774270 @default.
- W3092474243 cites W2157331557 @default.
- W3092474243 cites W2171775221 @default.
- W3092474243 cites W2194775991 @default.
- W3092474243 cites W2277195237 @default.
- W3092474243 cites W2295158492 @default.
- W3092474243 cites W2327501763 @default.
- W3092474243 cites W2402302915 @default.
- W3092474243 cites W2512544816 @default.
- W3092474243 cites W2714726990 @default.
- W3092474243 cites W2745461083 @default.
- W3092474243 cites W2889903020 @default.
- W3092474243 cites W2912954290 @default.
- W3092474243 cites W2914781455 @default.
- W3092474243 cites W2934842096 @default.
- W3092474243 cites W2936774411 @default.
- W3092474243 cites W2948140294 @default.
- W3092474243 cites W2950178297 @default.
- W3092474243 cites W2950761309 @default.
- W3092474243 cites W2953104586 @default.
- W3092474243 cites W2953106684 @default.
- W3092474243 cites W2953472911 @default.
- W3092474243 cites W2962685807 @default.
- W3092474243 cites W2962826786 @default.
- W3092474243 cites W2962832640 @default.
- W3092474243 cites W2962862718 @default.
- W3092474243 cites W2962866381 @default.
- W3092474243 cites W2963140463 @default.
- W3092474243 cites W2963347649 @default.
- W3092474243 cites W2963407669 @default.
- W3092474243 cites W2963565375 @default.
- W3092474243 cites W2964182350 @default.
- W3092474243 cites W2988907666 @default.
- W3092474243 cites W2991293848 @default.
- W3092474243 cites W2992526251 @default.
- W3092474243 cites W3005285208 @default.
- W3092474243 cites W3015678833 @default.
- W3092474243 cites W3031301501 @default.
- W3092474243 cites W3042657922 @default.
- W3092474243 cites W3098507616 @default.
- W3092474243 hasPublicationYear "2020" @default.
- W3092474243 type Work @default.
- W3092474243 sameAs 3092474243 @default.
- W3092474243 citedByCount "0" @default.
- W3092474243 crossrefType "posted-content" @default.
- W3092474243 hasAuthorship W3092474243A5010165733 @default.
- W3092474243 hasAuthorship W3092474243A5053704856 @default.
- W3092474243 hasAuthorship W3092474243A5082028315 @default.
- W3092474243 hasAuthorship W3092474243A5085262529 @default.
- W3092474243 hasConcept C138885662 @default.
- W3092474243 hasConcept C151730666 @default.
- W3092474243 hasConcept C154945302 @default.
- W3092474243 hasConcept C177264268 @default.
- W3092474243 hasConcept C199360897 @default.
- W3092474243 hasConcept C204321447 @default.
- W3092474243 hasConcept C2779343474 @default.
- W3092474243 hasConcept C2781238097 @default.
- W3092474243 hasConcept C28490314 @default.
- W3092474243 hasConcept C41008148 @default.
- W3092474243 hasConcept C41895202 @default.
- W3092474243 hasConcept C86803240 @default.
- W3092474243 hasConcept C90805587 @default.
- W3092474243 hasConceptScore W3092474243C138885662 @default.
- W3092474243 hasConceptScore W3092474243C151730666 @default.
- W3092474243 hasConceptScore W3092474243C154945302 @default.
- W3092474243 hasConceptScore W3092474243C177264268 @default.
- W3092474243 hasConceptScore W3092474243C199360897 @default.
- W3092474243 hasConceptScore W3092474243C204321447 @default.
- W3092474243 hasConceptScore W3092474243C2779343474 @default.
- W3092474243 hasConceptScore W3092474243C2781238097 @default.
- W3092474243 hasConceptScore W3092474243C28490314 @default.
- W3092474243 hasConceptScore W3092474243C41008148 @default.
- W3092474243 hasConceptScore W3092474243C41895202 @default.
- W3092474243 hasConceptScore W3092474243C86803240 @default.
- W3092474243 hasConceptScore W3092474243C90805587 @default.
- W3092474243 hasLocation W30924742431 @default.
- W3092474243 hasOpenAccess W3092474243 @default.
- W3092474243 hasPrimaryLocation W30924742431 @default.
- W3092474243 hasRelatedWork W1514139200 @default.
- W3092474243 hasRelatedWork W1549293594 @default.
- W3092474243 hasRelatedWork W182684292 @default.
- W3092474243 hasRelatedWork W2114449787 @default.
- W3092474243 hasRelatedWork W2123024445 @default.