Matches in SemOpenAlex for { <https://semopenalex.org/work/W3035299099> ?p ?o ?g. }
Showing items 1 to 94 of
94
with 100 items per page.
- W3035299099 abstract "This paper presents an audio visual automatic speech recognition (AV-ASR) system using a Transformer-based architecture. We particularly focus on the scene context provided by the visual information, to ground the ASR. We extract representations for audio features in the encoder layers of the transformer and fuse video features using an additional crossmodal multihead attention layer. Additionally, we incorporate a multitask training criterion for multiresolution ASR, where we train the model to generate both character and subword level transcriptions. Experimental results on the How2 dataset, indicate that multiresolution training can speed up convergence by around 50% and relatively improves word error rate (WER) performance by upto 18% over subword prediction models. Further, incorporating visual information improves performance with relative gains upto 3.76% over audio only models. Our results are comparable to state-of-the-art Listen, Attend and Spell-based architectures." @default.
- W3035299099 created "2020-06-19" @default.
- W3035299099 creator A5013650472 @default.
- W3035299099 creator A5026032406 @default.
- W3035299099 creator A5028342307 @default.
- W3035299099 creator A5061295857 @default.
- W3035299099 date "2020-01-01" @default.
- W3035299099 modified "2023-09-23" @default.
- W3035299099 title "Multimodal and Multiresolution Speech Recognition with Transformers" @default.
- W3035299099 cites W1503933356 @default.
- W3035299099 cites W2127141656 @default.
- W3035299099 cites W2143612262 @default.
- W3035299099 cites W2327501763 @default.
- W3035299099 cites W2530876040 @default.
- W3035299099 cites W2884254529 @default.
- W3035299099 cites W2884975363 @default.
- W3035299099 cites W2889624961 @default.
- W3035299099 cites W2890197052 @default.
- W3035299099 cites W2890952074 @default.
- W3035299099 cites W2897067191 @default.
- W3035299099 cites W2902348614 @default.
- W3035299099 cites W2940744433 @default.
- W3035299099 cites W2941814890 @default.
- W3035299099 cites W2962778134 @default.
- W3035299099 cites W2962929176 @default.
- W3035299099 cites W2962934715 @default.
- W3035299099 cites W2963250244 @default.
- W3035299099 cites W2963303028 @default.
- W3035299099 cites W2963341956 @default.
- W3035299099 cites W2963403868 @default.
- W3035299099 cites W2964051877 @default.
- W3035299099 cites W2964110616 @default.
- W3035299099 cites W2964182350 @default.
- W3035299099 cites W2972451902 @default.
- W3035299099 cites W2972892814 @default.
- W3035299099 cites W2981165461 @default.
- W3035299099 cites W2994673210 @default.
- W3035299099 cites W3042657922 @default.
- W3035299099 cites W854541894 @default.
- W3035299099 cites W98035269 @default.
- W3035299099 doi "https://doi.org/10.18653/v1/2020.acl-main.216" @default.
- W3035299099 hasPublicationYear "2020" @default.
- W3035299099 type Work @default.
- W3035299099 sameAs 3035299099 @default.
- W3035299099 citedByCount "19" @default.
- W3035299099 countsByYear W30352990992020 @default.
- W3035299099 countsByYear W30352990992021 @default.
- W3035299099 countsByYear W30352990992022 @default.
- W3035299099 countsByYear W30352990992023 @default.
- W3035299099 crossrefType "proceedings-article" @default.
- W3035299099 hasAuthorship W3035299099A5013650472 @default.
- W3035299099 hasAuthorship W3035299099A5026032406 @default.
- W3035299099 hasAuthorship W3035299099A5028342307 @default.
- W3035299099 hasAuthorship W3035299099A5061295857 @default.
- W3035299099 hasBestOaLocation W30352990991 @default.
- W3035299099 hasConcept C111919701 @default.
- W3035299099 hasConcept C118505674 @default.
- W3035299099 hasConcept C121332964 @default.
- W3035299099 hasConcept C154945302 @default.
- W3035299099 hasConcept C165801399 @default.
- W3035299099 hasConcept C28490314 @default.
- W3035299099 hasConcept C36464697 @default.
- W3035299099 hasConcept C40969351 @default.
- W3035299099 hasConcept C41008148 @default.
- W3035299099 hasConcept C62520636 @default.
- W3035299099 hasConcept C66322947 @default.
- W3035299099 hasConceptScore W3035299099C111919701 @default.
- W3035299099 hasConceptScore W3035299099C118505674 @default.
- W3035299099 hasConceptScore W3035299099C121332964 @default.
- W3035299099 hasConceptScore W3035299099C154945302 @default.
- W3035299099 hasConceptScore W3035299099C165801399 @default.
- W3035299099 hasConceptScore W3035299099C28490314 @default.
- W3035299099 hasConceptScore W3035299099C36464697 @default.
- W3035299099 hasConceptScore W3035299099C40969351 @default.
- W3035299099 hasConceptScore W3035299099C41008148 @default.
- W3035299099 hasConceptScore W3035299099C62520636 @default.
- W3035299099 hasConceptScore W3035299099C66322947 @default.
- W3035299099 hasLocation W30352990991 @default.
- W3035299099 hasOpenAccess W3035299099 @default.
- W3035299099 hasPrimaryLocation W30352990991 @default.
- W3035299099 hasRelatedWork W2275988210 @default.
- W3035299099 hasRelatedWork W2892009249 @default.
- W3035299099 hasRelatedWork W2998814410 @default.
- W3035299099 hasRelatedWork W3015974384 @default.
- W3035299099 hasRelatedWork W3137489363 @default.
- W3035299099 hasRelatedWork W3201953150 @default.
- W3035299099 hasRelatedWork W3209371554 @default.
- W3035299099 hasRelatedWork W4287266411 @default.
- W3035299099 hasRelatedWork W4287905975 @default.
- W3035299099 hasRelatedWork W4313001800 @default.
- W3035299099 isParatext "false" @default.
- W3035299099 isRetracted "false" @default.
- W3035299099 magId "3035299099" @default.
- W3035299099 workType "article" @default.