Matches in SemOpenAlex for { <https://semopenalex.org/work/W4309957268> ?p ?o ?g. }
Showing items 1 to 68 of
68
with 100 items per page.
- W4309957268 abstract "Vision transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships. Constrained by data hungry nature of transformers and limited labelled data most transformer-based models for audio tasks are finetuned from ImageNet pretrained models, despite the huge gap between the natural images domain and audio domain. This has motivated the research in self-supervised pretraining of audio transformers, which reduces the dependency on large amounts of labeled data and focuses on extracting concise representation of the audio spectrograms. In this paper, we propose ASiT, a novel self-supervised transformer for general audio representations that captures local and global contextual information employing group masked model learning and self-distillation. We evaluate our pretrained models on both audio and speech classification tasks including audio event classification, keyword spotting, and speaker identification. We further conduct comprehensive ablation studies, including evaluations of different pretraining strategies. The proposed ASiT framework significantly boosts the performance on all tasks and sets a new state-of-the-art performance on five audio and speech classification tasks, outperforming recent methods, including the approaches that use additional datasets for pretraining. The code and pretrained weights will be made publicly available for the scientific community." @default.
- W4309957268 created "2022-11-30" @default.
- W4309957268 creator A5028209738 @default.
- W4309957268 creator A5037459105 @default.
- W4309957268 creator A5037691180 @default.
- W4309957268 creator A5066967599 @default.
- W4309957268 creator A5087909247 @default.
- W4309957268 date "2022-11-23" @default.
- W4309957268 modified "2023-09-27" @default.
- W4309957268 title "ASiT: Audio Spectrogram vIsion Transformer for General Audio Representation" @default.
- W4309957268 doi "https://doi.org/10.48550/arxiv.2211.13189" @default.
- W4309957268 hasPublicationYear "2022" @default.
- W4309957268 type Work @default.
- W4309957268 citedByCount "0" @default.
- W4309957268 crossrefType "posted-content" @default.
- W4309957268 hasAuthorship W4309957268A5028209738 @default.
- W4309957268 hasAuthorship W4309957268A5037459105 @default.
- W4309957268 hasAuthorship W4309957268A5037691180 @default.
- W4309957268 hasAuthorship W4309957268A5066967599 @default.
- W4309957268 hasAuthorship W4309957268A5087909247 @default.
- W4309957268 hasBestOaLocation W43099572681 @default.
- W4309957268 hasConcept C119599485 @default.
- W4309957268 hasConcept C119857082 @default.
- W4309957268 hasConcept C127413603 @default.
- W4309957268 hasConcept C13895895 @default.
- W4309957268 hasConcept C154945302 @default.
- W4309957268 hasConcept C155635449 @default.
- W4309957268 hasConcept C157968479 @default.
- W4309957268 hasConcept C165801399 @default.
- W4309957268 hasConcept C2781213101 @default.
- W4309957268 hasConcept C28490314 @default.
- W4309957268 hasConcept C41008148 @default.
- W4309957268 hasConcept C45273575 @default.
- W4309957268 hasConcept C61328038 @default.
- W4309957268 hasConcept C64922751 @default.
- W4309957268 hasConcept C66322947 @default.
- W4309957268 hasConceptScore W4309957268C119599485 @default.
- W4309957268 hasConceptScore W4309957268C119857082 @default.
- W4309957268 hasConceptScore W4309957268C127413603 @default.
- W4309957268 hasConceptScore W4309957268C13895895 @default.
- W4309957268 hasConceptScore W4309957268C154945302 @default.
- W4309957268 hasConceptScore W4309957268C155635449 @default.
- W4309957268 hasConceptScore W4309957268C157968479 @default.
- W4309957268 hasConceptScore W4309957268C165801399 @default.
- W4309957268 hasConceptScore W4309957268C2781213101 @default.
- W4309957268 hasConceptScore W4309957268C28490314 @default.
- W4309957268 hasConceptScore W4309957268C41008148 @default.
- W4309957268 hasConceptScore W4309957268C45273575 @default.
- W4309957268 hasConceptScore W4309957268C61328038 @default.
- W4309957268 hasConceptScore W4309957268C64922751 @default.
- W4309957268 hasConceptScore W4309957268C66322947 @default.
- W4309957268 hasLocation W43099572681 @default.
- W4309957268 hasLocation W43099572682 @default.
- W4309957268 hasOpenAccess W4309957268 @default.
- W4309957268 hasPrimaryLocation W43099572681 @default.
- W4309957268 hasRelatedWork W1593153379 @default.
- W4309957268 hasRelatedWork W1985168493 @default.
- W4309957268 hasRelatedWork W2100584800 @default.
- W4309957268 hasRelatedWork W2205714567 @default.
- W4309957268 hasRelatedWork W2236677194 @default.
- W4309957268 hasRelatedWork W2736031499 @default.
- W4309957268 hasRelatedWork W2754746744 @default.
- W4309957268 hasRelatedWork W4382935140 @default.
- W4309957268 hasRelatedWork W4385474305 @default.
- W4309957268 hasRelatedWork W54679027 @default.
- W4309957268 isParatext "false" @default.
- W4309957268 isRetracted "false" @default.
- W4309957268 workType "article" @default.