Matches in SemOpenAlex for { <https://semopenalex.org/work/W3206342208> ?p ?o ?g. }
Showing items 1 to 98 of
98
with 100 items per page.
- W3206342208 abstract "As the volume of long-form spoken-word content such as podcasts explodes, many platforms desire to present short, meaningful, and logically coherent segments extracted from the full content. Such segments can be consumed by users to sample content before diving in, as well as used by the platform to promote and recommend content. However, little published work is focused on the segmentation of spoken-word content, where the errors (noise) in transcripts generated by automatic speech recognition (ASR) services poses many challenges. Here we build a novel dataset of complete transcriptions of over 400 podcast episodes, in which we label the position of introductions in each episode. These introductions contain information about the episodes' topics, hosts, and guests, providing a valuable summary of the episode content, as it is created by the authors. We further augment our dataset with word substitutions to increase the amount of available training data. We train three Transformer models based on the pre-trained BERT and different augmentation strategies, which achieve significantly better performance compared with a static embedding model, showing that it is possible to capture generalized, larger-scale structural information from noisy, loosely-organized speech data. This is further demonstrated through an analysis of the models' inner architecture. Our methods and dataset can be used to facilitate future work on the structure-based segmentation of spoken-word content." @default.
- W3206342208 created "2021-10-25" @default.
- W3206342208 creator A5011550385 @default.
- W3206342208 creator A5018758964 @default.
- W3206342208 creator A5035937120 @default.
- W3206342208 creator A5083506441 @default.
- W3206342208 date "2021-10-13" @default.
- W3206342208 modified "2023-09-23" @default.
- W3206342208 title "Identifying Introductions in Podcast Episodes from Automatically Generated Transcripts" @default.
- W3206342208 cites W2044599851 @default.
- W3206342208 cites W2082118847 @default.
- W3206342208 cites W2112078533 @default.
- W3206342208 cites W2124585778 @default.
- W3206342208 cites W2250539671 @default.
- W3206342208 cites W2407578762 @default.
- W3206342208 cites W2427527485 @default.
- W3206342208 cites W2487372942 @default.
- W3206342208 cites W2512971201 @default.
- W3206342208 cites W2741619822 @default.
- W3206342208 cites W2770236317 @default.
- W3206342208 cites W2793710388 @default.
- W3206342208 cites W2798240283 @default.
- W3206342208 cites W2807938752 @default.
- W3206342208 cites W2808531442 @default.
- W3206342208 cites W2889446948 @default.
- W3206342208 cites W2893508485 @default.
- W3206342208 cites W2896457183 @default.
- W3206342208 cites W2911588830 @default.
- W3206342208 cites W2912804155 @default.
- W3206342208 cites W2946417913 @default.
- W3206342208 cites W2962369866 @default.
- W3206342208 cites W2980282514 @default.
- W3206342208 cites W3019713789 @default.
- W3206342208 cites W3102568136 @default.
- W3206342208 cites W3162734203 @default.
- W3206342208 cites W3171388604 @default.
- W3206342208 cites W3192144552 @default.
- W3206342208 doi "https://doi.org/10.48550/arxiv.2110.07096" @default.
- W3206342208 hasPublicationYear "2021" @default.
- W3206342208 type Work @default.
- W3206342208 sameAs 3206342208 @default.
- W3206342208 citedByCount "0" @default.
- W3206342208 crossrefType "posted-content" @default.
- W3206342208 hasAuthorship W3206342208A5011550385 @default.
- W3206342208 hasAuthorship W3206342208A5018758964 @default.
- W3206342208 hasAuthorship W3206342208A5035937120 @default.
- W3206342208 hasAuthorship W3206342208A5083506441 @default.
- W3206342208 hasBestOaLocation W32063422081 @default.
- W3206342208 hasConcept C121332964 @default.
- W3206342208 hasConcept C138885662 @default.
- W3206342208 hasConcept C154945302 @default.
- W3206342208 hasConcept C165801399 @default.
- W3206342208 hasConcept C204321447 @default.
- W3206342208 hasConcept C23123220 @default.
- W3206342208 hasConcept C2777462759 @default.
- W3206342208 hasConcept C28490314 @default.
- W3206342208 hasConcept C41008148 @default.
- W3206342208 hasConcept C41608201 @default.
- W3206342208 hasConcept C41895202 @default.
- W3206342208 hasConcept C62520636 @default.
- W3206342208 hasConcept C66322947 @default.
- W3206342208 hasConcept C89600930 @default.
- W3206342208 hasConcept C90805587 @default.
- W3206342208 hasConcept C98501671 @default.
- W3206342208 hasConceptScore W3206342208C121332964 @default.
- W3206342208 hasConceptScore W3206342208C138885662 @default.
- W3206342208 hasConceptScore W3206342208C154945302 @default.
- W3206342208 hasConceptScore W3206342208C165801399 @default.
- W3206342208 hasConceptScore W3206342208C204321447 @default.
- W3206342208 hasConceptScore W3206342208C23123220 @default.
- W3206342208 hasConceptScore W3206342208C2777462759 @default.
- W3206342208 hasConceptScore W3206342208C28490314 @default.
- W3206342208 hasConceptScore W3206342208C41008148 @default.
- W3206342208 hasConceptScore W3206342208C41608201 @default.
- W3206342208 hasConceptScore W3206342208C41895202 @default.
- W3206342208 hasConceptScore W3206342208C62520636 @default.
- W3206342208 hasConceptScore W3206342208C66322947 @default.
- W3206342208 hasConceptScore W3206342208C89600930 @default.
- W3206342208 hasConceptScore W3206342208C90805587 @default.
- W3206342208 hasConceptScore W3206342208C98501671 @default.
- W3206342208 hasLocation W32063422081 @default.
- W3206342208 hasLocation W32063422082 @default.
- W3206342208 hasOpenAccess W3206342208 @default.
- W3206342208 hasPrimaryLocation W32063422081 @default.
- W3206342208 hasRelatedWork W2335882425 @default.
- W3206342208 hasRelatedWork W2757566340 @default.
- W3206342208 hasRelatedWork W2949267551 @default.
- W3206342208 hasRelatedWork W2993300079 @default.
- W3206342208 hasRelatedWork W3107679445 @default.
- W3206342208 hasRelatedWork W3134737443 @default.
- W3206342208 hasRelatedWork W3143412223 @default.
- W3206342208 hasRelatedWork W3202766982 @default.
- W3206342208 hasRelatedWork W4221011941 @default.
- W3206342208 hasRelatedWork W4307613132 @default.
- W3206342208 isParatext "false" @default.
- W3206342208 isRetracted "false" @default.
- W3206342208 magId "3206342208" @default.
- W3206342208 workType "article" @default.