Matches in SemOpenAlex for { <https://semopenalex.org/work/W4386076133> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W4386076133 abstract "Audiovisual automatic speech recognition (AV-ASR) aims to improve the robustness of a speech recognition system by incorporating visual information. Training fully supervised multimodal models for this task from scratch, however is limited by the need for large labelled audiovisual datasets (in each downstream domain of interest). We present AVFormer, a simple method for augmenting audio-only models with visual information, at the same time performing lightweight domain adaptation. We do this by (i) injecting visual embeddings into a frozen ASR model using lightweight trainable adaptors. We show that these can be trained on a small amount of weakly labelled video data with minimum additional training time and parameters. (ii) We also introduce a simple curriculum scheme during training which we show is crucial to enable the model to jointly process audio and visual information effectively; and finally (iii) we show that our model achieves state of the art zero-shot results on three different AV-ASR benchmarks (How2, VisSpeech and Ego4D), while also crucially preserving decent performance on traditional audio-only speech recognition benchmarks (LibriSpeech). Qualitative results show that our model effectively leverages visual information for robust speech recognition." @default.
- W4386076133 created "2023-08-23" @default.
- W4386076133 creator A5036002448 @default.
- W4386076133 creator A5045217258 @default.
- W4386076133 creator A5051808590 @default.
- W4386076133 date "2023-06-01" @default.
- W4386076133 modified "2023-09-27" @default.
- W4386076133 title "AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR" @default.
- W4386076133 cites W2076462394 @default.
- W4386076133 cites W2127141656 @default.
- W4386076133 cites W2289925289 @default.
- W4386076133 cites W2327501763 @default.
- W4386076133 cites W2512544816 @default.
- W4386076133 cites W2714726990 @default.
- W4386076133 cites W2952746495 @default.
- W4386076133 cites W2962866381 @default.
- W4386076133 cites W2963654155 @default.
- W4386076133 cites W2964182350 @default.
- W4386076133 cites W2984008963 @default.
- W4386076133 cites W2995181338 @default.
- W4386076133 cites W3006974783 @default.
- W4386076133 cites W3015678833 @default.
- W4386076133 cites W3035299099 @default.
- W4386076133 cites W3035635319 @default.
- W4386076133 cites W3097777922 @default.
- W4386076133 cites W3160525311 @default.
- W4386076133 cites W3162293946 @default.
- W4386076133 cites W3205786327 @default.
- W4386076133 cites W3209059054 @default.
- W4386076133 cites W4205991051 @default.
- W4386076133 cites W4226033575 @default.
- W4386076133 doi "https://doi.org/10.1109/cvpr52729.2023.02195" @default.
- W4386076133 hasPublicationYear "2023" @default.
- W4386076133 type Work @default.
- W4386076133 citedByCount "0" @default.
- W4386076133 crossrefType "proceedings-article" @default.
- W4386076133 hasAuthorship W4386076133A5036002448 @default.
- W4386076133 hasAuthorship W4386076133A5045217258 @default.
- W4386076133 hasAuthorship W4386076133A5051808590 @default.
- W4386076133 hasConcept C104317684 @default.
- W4386076133 hasConcept C111919701 @default.
- W4386076133 hasConcept C154945302 @default.
- W4386076133 hasConcept C162324750 @default.
- W4386076133 hasConcept C178790620 @default.
- W4386076133 hasConcept C185592680 @default.
- W4386076133 hasConcept C187736073 @default.
- W4386076133 hasConcept C2778344882 @default.
- W4386076133 hasConcept C2780451532 @default.
- W4386076133 hasConcept C2781235140 @default.
- W4386076133 hasConcept C28490314 @default.
- W4386076133 hasConcept C3017588708 @default.
- W4386076133 hasConcept C41008148 @default.
- W4386076133 hasConcept C49774154 @default.
- W4386076133 hasConcept C55493867 @default.
- W4386076133 hasConcept C63479239 @default.
- W4386076133 hasConceptScore W4386076133C104317684 @default.
- W4386076133 hasConceptScore W4386076133C111919701 @default.
- W4386076133 hasConceptScore W4386076133C154945302 @default.
- W4386076133 hasConceptScore W4386076133C162324750 @default.
- W4386076133 hasConceptScore W4386076133C178790620 @default.
- W4386076133 hasConceptScore W4386076133C185592680 @default.
- W4386076133 hasConceptScore W4386076133C187736073 @default.
- W4386076133 hasConceptScore W4386076133C2778344882 @default.
- W4386076133 hasConceptScore W4386076133C2780451532 @default.
- W4386076133 hasConceptScore W4386076133C2781235140 @default.
- W4386076133 hasConceptScore W4386076133C28490314 @default.
- W4386076133 hasConceptScore W4386076133C3017588708 @default.
- W4386076133 hasConceptScore W4386076133C41008148 @default.
- W4386076133 hasConceptScore W4386076133C49774154 @default.
- W4386076133 hasConceptScore W4386076133C55493867 @default.
- W4386076133 hasConceptScore W4386076133C63479239 @default.
- W4386076133 hasLocation W43860761331 @default.
- W4386076133 hasOpenAccess W4386076133 @default.
- W4386076133 hasPrimaryLocation W43860761331 @default.
- W4386076133 hasRelatedWork W2000534859 @default.
- W4386076133 hasRelatedWork W2065109233 @default.
- W4386076133 hasRelatedWork W2081647779 @default.
- W4386076133 hasRelatedWork W2134867751 @default.
- W4386076133 hasRelatedWork W2541791370 @default.
- W4386076133 hasRelatedWork W2981091784 @default.
- W4386076133 hasRelatedWork W3213778687 @default.
- W4386076133 hasRelatedWork W4237750775 @default.
- W4386076133 hasRelatedWork W4290805679 @default.
- W4386076133 hasRelatedWork W4313347119 @default.
- W4386076133 isParatext "false" @default.
- W4386076133 isRetracted "false" @default.
- W4386076133 workType "article" @default.