Matches in SemOpenAlex for { <https://semopenalex.org/work/W3023895054> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W3023895054 abstract "This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract representations for audio features in the encoder layers of the transformer and fuse video features using an additional crossmodal multihead attention layer. Additionally, we incorporate a multitask training criterion for multiresolution ASR, where we train the model to generate both character and subword level transcriptions. Experimental results on the How2 dataset, indicate that multiresolution training can speed up convergence by around 50% and relatively improves word error rate (WER) performance by upto 18% over subword prediction models. Further, incorporating visual information improves performance with relative gains upto 3.76% over audio only models. Our results are comparable to state-of-the-art Listen, Attend and Spell-based architectures." @default.
- W3023895054 created "2020-05-13" @default.
- W3023895054 creator A5013650472 @default.
- W3023895054 creator A5026032406 @default.
- W3023895054 creator A5028342307 @default.
- W3023895054 creator A5061295857 @default.
- W3023895054 date "2020-04-29" @default.
- W3023895054 modified "2023-09-23" @default.
- W3023895054 title "Multiresolution and Multimodal Speech Recognition with Transformers" @default.
- W3023895054 cites W1503933356 @default.
- W3023895054 cites W2127141656 @default.
- W3023895054 cites W2143612262 @default.
- W3023895054 cites W2327501763 @default.
- W3023895054 cites W2530876040 @default.
- W3023895054 cites W2884254529 @default.
- W3023895054 cites W2884975363 @default.
- W3023895054 cites W2890197052 @default.
- W3023895054 cites W2890952074 @default.
- W3023895054 cites W2897067191 @default.
- W3023895054 cites W2902348614 @default.
- W3023895054 cites W2940744433 @default.
- W3023895054 cites W2941814890 @default.
- W3023895054 cites W2943493972 @default.
- W3023895054 cites W2962778134 @default.
- W3023895054 cites W2962929176 @default.
- W3023895054 cites W2962934715 @default.
- W3023895054 cites W2963250244 @default.
- W3023895054 cites W2963303028 @default.
- W3023895054 cites W2963341956 @default.
- W3023895054 cites W2964110616 @default.
- W3023895054 cites W2964182350 @default.
- W3023895054 cites W2972451902 @default.
- W3023895054 cites W2972892814 @default.
- W3023895054 cites W2981165461 @default.
- W3023895054 cites W2994673210 @default.
- W3023895054 cites W3042657922 @default.
- W3023895054 cites W98035269 @default.
- W3023895054 hasPublicationYear "2020" @default.
- W3023895054 type Work @default.
- W3023895054 sameAs 3023895054 @default.
- W3023895054 citedByCount "4" @default.
- W3023895054 countsByYear W30238950542020 @default.
- W3023895054 countsByYear W30238950542021 @default.
- W3023895054 crossrefType "posted-content" @default.
- W3023895054 hasAuthorship W3023895054A5013650472 @default.
- W3023895054 hasAuthorship W3023895054A5026032406 @default.
- W3023895054 hasAuthorship W3023895054A5028342307 @default.
- W3023895054 hasAuthorship W3023895054A5061295857 @default.
- W3023895054 hasConcept C111919701 @default.
- W3023895054 hasConcept C118505674 @default.
- W3023895054 hasConcept C121332964 @default.
- W3023895054 hasConcept C154945302 @default.
- W3023895054 hasConcept C165801399 @default.
- W3023895054 hasConcept C28490314 @default.
- W3023895054 hasConcept C40969351 @default.
- W3023895054 hasConcept C41008148 @default.
- W3023895054 hasConcept C62520636 @default.
- W3023895054 hasConcept C66322947 @default.
- W3023895054 hasConceptScore W3023895054C111919701 @default.
- W3023895054 hasConceptScore W3023895054C118505674 @default.
- W3023895054 hasConceptScore W3023895054C121332964 @default.
- W3023895054 hasConceptScore W3023895054C154945302 @default.
- W3023895054 hasConceptScore W3023895054C165801399 @default.
- W3023895054 hasConceptScore W3023895054C28490314 @default.
- W3023895054 hasConceptScore W3023895054C40969351 @default.
- W3023895054 hasConceptScore W3023895054C41008148 @default.
- W3023895054 hasConceptScore W3023895054C62520636 @default.
- W3023895054 hasConceptScore W3023895054C66322947 @default.
- W3023895054 hasLocation W30238950541 @default.
- W3023895054 hasOpenAccess W3023895054 @default.
- W3023895054 hasPrimaryLocation W30238950541 @default.
- W3023895054 hasRelatedWork W2120776756 @default.
- W3023895054 hasRelatedWork W2951442257 @default.
- W3023895054 hasRelatedWork W2972889948 @default.
- W3023895054 hasRelatedWork W2980988419 @default.
- W3023895054 hasRelatedWork W2997725772 @default.
- W3023895054 hasRelatedWork W3011207290 @default.
- W3023895054 hasRelatedWork W3011234510 @default.
- W3023895054 hasRelatedWork W3015399080 @default.
- W3023895054 hasRelatedWork W3015449694 @default.
- W3023895054 hasRelatedWork W3015457435 @default.
- W3023895054 hasRelatedWork W3015678833 @default.
- W3023895054 hasRelatedWork W3015752032 @default.
- W3023895054 hasRelatedWork W3015889230 @default.
- W3023895054 hasRelatedWork W3016188195 @default.
- W3023895054 hasRelatedWork W3095189764 @default.
- W3023895054 hasRelatedWork W3096723250 @default.
- W3023895054 hasRelatedWork W3107588252 @default.
- W3023895054 hasRelatedWork W3141756469 @default.
- W3023895054 hasRelatedWork W3148040514 @default.
- W3023895054 hasRelatedWork W3171357516 @default.
- W3023895054 isParatext "false" @default.
- W3023895054 isRetracted "false" @default.
- W3023895054 magId "3023895054" @default.
- W3023895054 workType "article" @default.