Matches in SemOpenAlex for { <https://semopenalex.org/work/W3041561163> ?p ?o ?g. }
- W3041561163 endingPage "2366" @default.
- W3041561163 startingPage "2351" @default.
- W3041561163 abstract "We introduce a self-supervised speech pre-training method called TERA, which stands for Transformer Encoder Representations from Alteration. Recent approaches often learn by using a single auxiliary task like contrastive prediction, autoregressive prediction, or masked reconstruction. Unlike previous methods, we use alteration along three orthogonal axes to pre-train Transformer Encoders on a large amount of unlabeled speech. The model learns through the reconstruction of acoustic frames from their altered counterpart, where we use a stochastic policy to alter along various dimensions: time, frequency, and magnitude. TERA can be used for speech representations extraction or fine-tuning with downstream models. We evaluate TERA on several downstream tasks, including phoneme classification, keyword spotting, speaker recognition, and speech recognition. We present a large-scale comparison of various self-supervised models. TERA achieves strong performance in the comparison by improving upon surface features and outperforming previous models. In our experiments, we study the effect of applying different alteration techniques, pre-training on more data, and pre-training on various features. We analyze different model sizes and find that smaller models are strong representation learners than larger models, while larger models are more effective for downstream fine-tuning than smaller models. Furthermore, we show the proposed method is transferable to downstream datasets not used in pre-training." @default.
- W3041561163 created "2020-07-16" @default.
- W3041561163 creator A5018050919 @default.
- W3041561163 creator A5029566548 @default.
- W3041561163 creator A5040508737 @default.
- W3041561163 date "2021-01-01" @default.
- W3041561163 modified "2023-10-16" @default.
- W3041561163 title "TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech" @default.
- W3041561163 cites W1494198834 @default.
- W3041561163 cites W2002342963 @default.
- W3041561163 cites W2077804127 @default.
- W3041561163 cites W2127141656 @default.
- W3041561163 cites W2794209590 @default.
- W3041561163 cites W2927746189 @default.
- W3041561163 cites W2936774411 @default.
- W3041561163 cites W2947445680 @default.
- W3041561163 cites W2962739339 @default.
- W3041561163 cites W2962901777 @default.
- W3041561163 cites W2964227577 @default.
- W3041561163 cites W2972451902 @default.
- W3041561163 cites W2972943112 @default.
- W3041561163 cites W2973049979 @default.
- W3041561163 cites W2973157397 @default.
- W3041561163 cites W2982223350 @default.
- W3041561163 cites W2995181338 @default.
- W3041561163 cites W3003875258 @default.
- W3041561163 cites W3015265920 @default.
- W3041561163 cites W3015356564 @default.
- W3041561163 cites W3015412890 @default.
- W3041561163 cites W3015949486 @default.
- W3041561163 cites W3016011332 @default.
- W3041561163 cites W3016181583 @default.
- W3041561163 cites W3024182269 @default.
- W3041561163 cites W3035202887 @default.
- W3041561163 cites W3048217718 @default.
- W3041561163 cites W3096017728 @default.
- W3041561163 cites W3096171739 @default.
- W3041561163 cites W3096485810 @default.
- W3041561163 cites W3096626135 @default.
- W3041561163 cites W3097286738 @default.
- W3041561163 cites W3125709657 @default.
- W3041561163 doi "https://doi.org/10.1109/taslp.2021.3095662" @default.
- W3041561163 hasPublicationYear "2021" @default.
- W3041561163 type Work @default.
- W3041561163 sameAs 3041561163 @default.
- W3041561163 citedByCount "114" @default.
- W3041561163 countsByYear W30415611632020 @default.
- W3041561163 countsByYear W30415611632021 @default.
- W3041561163 countsByYear W30415611632022 @default.
- W3041561163 countsByYear W30415611632023 @default.
- W3041561163 crossrefType "journal-article" @default.
- W3041561163 hasAuthorship W3041561163A5018050919 @default.
- W3041561163 hasAuthorship W3041561163A5029566548 @default.
- W3041561163 hasAuthorship W3041561163A5040508737 @default.
- W3041561163 hasBestOaLocation W30415611632 @default.
- W3041561163 hasConcept C111919701 @default.
- W3041561163 hasConcept C118505674 @default.
- W3041561163 hasConcept C119599485 @default.
- W3041561163 hasConcept C119857082 @default.
- W3041561163 hasConcept C127413603 @default.
- W3041561163 hasConcept C149782125 @default.
- W3041561163 hasConcept C153180895 @default.
- W3041561163 hasConcept C154945302 @default.
- W3041561163 hasConcept C159877910 @default.
- W3041561163 hasConcept C165801399 @default.
- W3041561163 hasConcept C2775904623 @default.
- W3041561163 hasConcept C2781213101 @default.
- W3041561163 hasConcept C28490314 @default.
- W3041561163 hasConcept C33923547 @default.
- W3041561163 hasConcept C41008148 @default.
- W3041561163 hasConcept C59404180 @default.
- W3041561163 hasConcept C61328038 @default.
- W3041561163 hasConcept C66322947 @default.
- W3041561163 hasConceptScore W3041561163C111919701 @default.
- W3041561163 hasConceptScore W3041561163C118505674 @default.
- W3041561163 hasConceptScore W3041561163C119599485 @default.
- W3041561163 hasConceptScore W3041561163C119857082 @default.
- W3041561163 hasConceptScore W3041561163C127413603 @default.
- W3041561163 hasConceptScore W3041561163C149782125 @default.
- W3041561163 hasConceptScore W3041561163C153180895 @default.
- W3041561163 hasConceptScore W3041561163C154945302 @default.
- W3041561163 hasConceptScore W3041561163C159877910 @default.
- W3041561163 hasConceptScore W3041561163C165801399 @default.
- W3041561163 hasConceptScore W3041561163C2775904623 @default.
- W3041561163 hasConceptScore W3041561163C2781213101 @default.
- W3041561163 hasConceptScore W3041561163C28490314 @default.
- W3041561163 hasConceptScore W3041561163C33923547 @default.
- W3041561163 hasConceptScore W3041561163C41008148 @default.
- W3041561163 hasConceptScore W3041561163C59404180 @default.
- W3041561163 hasConceptScore W3041561163C61328038 @default.
- W3041561163 hasConceptScore W3041561163C66322947 @default.
- W3041561163 hasFunder F4320321040 @default.
- W3041561163 hasFunder F4320323900 @default.
- W3041561163 hasLocation W30415611631 @default.
- W3041561163 hasLocation W30415611632 @default.
- W3041561163 hasOpenAccess W3041561163 @default.
- W3041561163 hasPrimaryLocation W30415611631 @default.
- W3041561163 hasRelatedWork W2372328424 @default.