Matches in SemOpenAlex for { <https://semopenalex.org/work/W2001068000> ?p ?o ?g. }
- W2001068000 endingPage "494" @default.
- W2001068000 startingPage "468" @default.
- W2001068000 abstract "Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have constructed a hidden Markov model (HMM) system to detect sentence boundaries that uses both prosodic and textual information. Since there are more nonsentence boundaries than sentence boundaries in the data, the prosody model, which is implemented as a decision tree classifier, must be constructed to effectively learn from the imbalanced data distribution. To address this problem, we investigate a variety of sampling approaches and a bagging scheme. A pilot study was carried out to select methods to apply to the full NIST sentence boundary evaluation task across two corpora (conversational telephone speech and broadcast news speech), using both human transcriptions and recognition output. In the pilot study, when classification error rate is the performance measure, using the original training set achieves the best performance among the sampling methods, and an ensemble of multiple classifiers from different downsampled training sets achieves slightly poorer performance, but has the potential to reduce computational effort. However, when performance is measured using receiver operating characteristics (ROC) or area under the curve (AUC), then the sampling approaches outperform the original training set. This observation is important if the sentence boundary detection output is used by downstream language processing modules. Bagging was found to significantly improve system performance for each of the sampling methods. The gain from these methods may be diminished when the prosody model is combined with the language model, which is a strong knowledge source for the sentence detection task. The patterns found in the pilot study were replicated in the full NIST evaluation task. The conclusions may be dependent on the task, the classifiers, and the knowledge combination approach." @default.
- W2001068000 created "2016-06-24" @default.
- W2001068000 creator A5023363049 @default.
- W2001068000 creator A5053498662 @default.
- W2001068000 creator A5055623733 @default.
- W2001068000 creator A5060979948 @default.
- W2001068000 creator A5068157871 @default.
- W2001068000 date "2006-10-01" @default.
- W2001068000 modified "2023-09-26" @default.
- W2001068000 title "A study in machine learning from imbalanced data for sentence boundary detection in speech" @default.
- W2001068000 cites W140329658 @default.
- W2001068000 cites W1504308419 @default.
- W2001068000 cites W1511530654 @default.
- W2001068000 cites W1563235770 @default.
- W2001068000 cites W1605688901 @default.
- W2001068000 cites W1605695115 @default.
- W2001068000 cites W1846608861 @default.
- W2001068000 cites W192736094 @default.
- W2001068000 cites W1941659294 @default.
- W2001068000 cites W2005252629 @default.
- W2001068000 cites W2007605886 @default.
- W2001068000 cites W2014006248 @default.
- W2001068000 cites W2018560257 @default.
- W2001068000 cites W2065388812 @default.
- W2001068000 cites W2078830591 @default.
- W2001068000 cites W2088160102 @default.
- W2001068000 cites W2096942889 @default.
- W2001068000 cites W2105594594 @default.
- W2001068000 cites W2114968414 @default.
- W2001068000 cites W2122591164 @default.
- W2001068000 cites W2133065881 @default.
- W2001068000 cites W2137029138 @default.
- W2001068000 cites W2148143831 @default.
- W2001068000 cites W2155653793 @default.
- W2001068000 cites W2170502024 @default.
- W2001068000 cites W2426479676 @default.
- W2001068000 cites W260797086 @default.
- W2001068000 cites W32283220 @default.
- W2001068000 cites W4212883601 @default.
- W2001068000 cites W4300584885 @default.
- W2001068000 cites W94670513 @default.
- W2001068000 doi "https://doi.org/10.1016/j.csl.2005.06.002" @default.
- W2001068000 hasPublicationYear "2006" @default.
- W2001068000 type Work @default.
- W2001068000 sameAs 2001068000 @default.
- W2001068000 citedByCount "106" @default.
- W2001068000 countsByYear W20010680002012 @default.
- W2001068000 countsByYear W20010680002013 @default.
- W2001068000 countsByYear W20010680002014 @default.
- W2001068000 countsByYear W20010680002015 @default.
- W2001068000 countsByYear W20010680002016 @default.
- W2001068000 countsByYear W20010680002017 @default.
- W2001068000 countsByYear W20010680002018 @default.
- W2001068000 countsByYear W20010680002019 @default.
- W2001068000 countsByYear W20010680002020 @default.
- W2001068000 countsByYear W20010680002021 @default.
- W2001068000 countsByYear W20010680002022 @default.
- W2001068000 countsByYear W20010680002023 @default.
- W2001068000 crossrefType "journal-article" @default.
- W2001068000 hasAuthorship W2001068000A5023363049 @default.
- W2001068000 hasAuthorship W2001068000A5053498662 @default.
- W2001068000 hasAuthorship W2001068000A5055623733 @default.
- W2001068000 hasAuthorship W2001068000A5060979948 @default.
- W2001068000 hasAuthorship W2001068000A5068157871 @default.
- W2001068000 hasConcept C106131492 @default.
- W2001068000 hasConcept C111219384 @default.
- W2001068000 hasConcept C119857082 @default.
- W2001068000 hasConcept C137293760 @default.
- W2001068000 hasConcept C140779682 @default.
- W2001068000 hasConcept C154945302 @default.
- W2001068000 hasConcept C204321447 @default.
- W2001068000 hasConcept C23224414 @default.
- W2001068000 hasConcept C2777530160 @default.
- W2001068000 hasConcept C28490314 @default.
- W2001068000 hasConcept C31972630 @default.
- W2001068000 hasConcept C41008148 @default.
- W2001068000 hasConcept C542774811 @default.
- W2001068000 hasConcept C95623464 @default.
- W2001068000 hasConceptScore W2001068000C106131492 @default.
- W2001068000 hasConceptScore W2001068000C111219384 @default.
- W2001068000 hasConceptScore W2001068000C119857082 @default.
- W2001068000 hasConceptScore W2001068000C137293760 @default.
- W2001068000 hasConceptScore W2001068000C140779682 @default.
- W2001068000 hasConceptScore W2001068000C154945302 @default.
- W2001068000 hasConceptScore W2001068000C204321447 @default.
- W2001068000 hasConceptScore W2001068000C23224414 @default.
- W2001068000 hasConceptScore W2001068000C2777530160 @default.
- W2001068000 hasConceptScore W2001068000C28490314 @default.
- W2001068000 hasConceptScore W2001068000C31972630 @default.
- W2001068000 hasConceptScore W2001068000C41008148 @default.
- W2001068000 hasConceptScore W2001068000C542774811 @default.
- W2001068000 hasConceptScore W2001068000C95623464 @default.
- W2001068000 hasIssue "4" @default.
- W2001068000 hasLocation W20010680001 @default.
- W2001068000 hasOpenAccess W2001068000 @default.
- W2001068000 hasPrimaryLocation W20010680001 @default.
- W2001068000 hasRelatedWork W2001068000 @default.
- W2001068000 hasRelatedWork W2001732961 @default.