Matches in SemOpenAlex for { <https://semopenalex.org/work/W3095909497> ?p ?o ?g. }
- W3095909497 endingPage "134" @default.
- W3095909497 startingPage "124" @default.
- W3095909497 abstract "Audio is the main form for the visually impaired to obtain information. In reality, all kinds of visual data always exist, but audio data does not exist in many cases. In order to help the visually impaired people to better perceive the information around them, an image-to-audio-description (I2AD) task is proposed to generate audio descriptions from images in this paper. To complete this totally new task, a modal translation network (MT-Net) from visual to auditory sense is proposed. The proposed MT-Net includes three progressive sub-networks: 1) feature learning, 2) cross-modal mapping, and 3) audio generation. First, the feature learning sub-network aims to learn semantic features from image and audio, including image feature learning and audio feature learning. Second, the cross-modal mapping sub-network transforms the image feature into a cross-modal representation with the same semantic concept as the audio feature. In this way, the correlation of inter-modal data is effectively mined for easing the heterogeneous gap between image and audio. Finally, the audio generation sub-network is designed to generate the audio waveform from the cross-modal representation. The generated audio waveform is interpolated to obtain the corresponding audio file according to the sample frequency. Being the first attempt to explore the I2AD task, three large-scale datasets with plenty of manual audio descriptions are built. Experiments on the datasets verify the feasibility of generating intelligible audio from an image directly and the effectiveness of proposed method." @default.
- W3095909497 created "2020-11-09" @default.
- W3095909497 creator A5007883693 @default.
- W3095909497 creator A5011540763 @default.
- W3095909497 creator A5018824735 @default.
- W3095909497 creator A5055644335 @default.
- W3095909497 date "2021-01-01" @default.
- W3095909497 modified "2023-09-24" @default.
- W3095909497 title "Audio description from image by modal translation network" @default.
- W3095909497 cites W1981649281 @default.
- W3095909497 cites W2326114782 @default.
- W3095909497 cites W2510847698 @default.
- W3095909497 cites W2578879311 @default.
- W3095909497 cites W2619383789 @default.
- W3095909497 cites W2619697695 @default.
- W3095909497 cites W2740443574 @default.
- W3095909497 cites W2746105469 @default.
- W3095909497 cites W2795389793 @default.
- W3095909497 cites W2795832645 @default.
- W3095909497 cites W2889924266 @default.
- W3095909497 cites W2900904083 @default.
- W3095909497 cites W2908374254 @default.
- W3095909497 cites W2944026969 @default.
- W3095909497 cites W2958442373 @default.
- W3095909497 cites W2962765188 @default.
- W3095909497 cites W2962865004 @default.
- W3095909497 cites W2963066677 @default.
- W3095909497 cites W2963290645 @default.
- W3095909497 cites W2963351212 @default.
- W3095909497 cites W2963663420 @default.
- W3095909497 cites W2964352155 @default.
- W3095909497 cites W2966350350 @default.
- W3095909497 cites W2985214930 @default.
- W3095909497 cites W2987489329 @default.
- W3095909497 cites W2987989623 @default.
- W3095909497 cites W2989489923 @default.
- W3095909497 cites W2995904231 @default.
- W3095909497 cites W3005718951 @default.
- W3095909497 cites W3020188837 @default.
- W3095909497 cites W3025645599 @default.
- W3095909497 cites W4243894986 @default.
- W3095909497 doi "https://doi.org/10.1016/j.neucom.2020.10.053" @default.
- W3095909497 hasPublicationYear "2021" @default.
- W3095909497 type Work @default.
- W3095909497 sameAs 3095909497 @default.
- W3095909497 citedByCount "10" @default.
- W3095909497 countsByYear W30959094972021 @default.
- W3095909497 countsByYear W30959094972022 @default.
- W3095909497 countsByYear W30959094972023 @default.
- W3095909497 crossrefType "journal-article" @default.
- W3095909497 hasAuthorship W3095909497A5007883693 @default.
- W3095909497 hasAuthorship W3095909497A5011540763 @default.
- W3095909497 hasAuthorship W3095909497A5018824735 @default.
- W3095909497 hasAuthorship W3095909497A5055644335 @default.
- W3095909497 hasBestOaLocation W30959094972 @default.
- W3095909497 hasConcept C115961682 @default.
- W3095909497 hasConcept C138885662 @default.
- W3095909497 hasConcept C153180895 @default.
- W3095909497 hasConcept C154945302 @default.
- W3095909497 hasConcept C155635449 @default.
- W3095909497 hasConcept C157968479 @default.
- W3095909497 hasConcept C162324750 @default.
- W3095909497 hasConcept C17744445 @default.
- W3095909497 hasConcept C185592680 @default.
- W3095909497 hasConcept C187736073 @default.
- W3095909497 hasConcept C188027245 @default.
- W3095909497 hasConcept C199539241 @default.
- W3095909497 hasConcept C2776359362 @default.
- W3095909497 hasConcept C2776401178 @default.
- W3095909497 hasConcept C2779757391 @default.
- W3095909497 hasConcept C2780451532 @default.
- W3095909497 hasConcept C28490314 @default.
- W3095909497 hasConcept C41008148 @default.
- W3095909497 hasConcept C41895202 @default.
- W3095909497 hasConcept C59404180 @default.
- W3095909497 hasConcept C61328038 @default.
- W3095909497 hasConcept C71139939 @default.
- W3095909497 hasConcept C94625758 @default.
- W3095909497 hasConceptScore W3095909497C115961682 @default.
- W3095909497 hasConceptScore W3095909497C138885662 @default.
- W3095909497 hasConceptScore W3095909497C153180895 @default.
- W3095909497 hasConceptScore W3095909497C154945302 @default.
- W3095909497 hasConceptScore W3095909497C155635449 @default.
- W3095909497 hasConceptScore W3095909497C157968479 @default.
- W3095909497 hasConceptScore W3095909497C162324750 @default.
- W3095909497 hasConceptScore W3095909497C17744445 @default.
- W3095909497 hasConceptScore W3095909497C185592680 @default.
- W3095909497 hasConceptScore W3095909497C187736073 @default.
- W3095909497 hasConceptScore W3095909497C188027245 @default.
- W3095909497 hasConceptScore W3095909497C199539241 @default.
- W3095909497 hasConceptScore W3095909497C2776359362 @default.
- W3095909497 hasConceptScore W3095909497C2776401178 @default.
- W3095909497 hasConceptScore W3095909497C2779757391 @default.
- W3095909497 hasConceptScore W3095909497C2780451532 @default.
- W3095909497 hasConceptScore W3095909497C28490314 @default.
- W3095909497 hasConceptScore W3095909497C41008148 @default.
- W3095909497 hasConceptScore W3095909497C41895202 @default.
- W3095909497 hasConceptScore W3095909497C59404180 @default.