Matches in SemOpenAlex for { <https://semopenalex.org/work/W2105372662> ?p ?o ?g. }
- W2105372662 abstract "This thesis investigates how the sub-structure of words can be accounted for in probabilistic models of language. Such models play an important role in natural language processing tasks such as translation or speech recognition, but often rely on the simplistic assumption that words are opaque symbols. This assumption does not fit morphologically complex language well, where words can have rich internal structure and sub-word elements are shared across distinct word forms. Our approach is to encode basic notions of morphology into the assumptions of three different types of language models, with the intention that leveraging shared sub-word structure can improve model performance and help overcome data sparsity that arises from morphological processes. In the context of n-gram language modelling, we formulate a new Bayesian model that relies on the decomposition of compound words to attain better smoothing, and we develop a new distributed language model that learns vector representations of morphemes and leverages them to link together morphologically related words. In both cases, we show that accounting for word sub-structure improves the models' intrinsic performance and provides benefits when applied to other tasks, including machine translation. We then shift the focus beyond the modelling of word sequences and consider models that automatically learn what the sub-word elements of a given language are, given an unannotated list of words. We formulate a novel model that can learn discontiguous morphemes in addition to the more conventional contiguous morphemes that most previous models are limited to. This approach is demonstrated on Semitic languages, and we find that modelling discontiguous sub-word structures leads to improvements in the task of segmenting words into their contiguous morphemes." @default.
- W2105372662 created "2016-06-24" @default.
- W2105372662 creator A5043555135 @default.
- W2105372662 date "2015-08-18" @default.
- W2105372662 modified "2023-09-27" @default.
- W2105372662 title "Probabilistic modelling of morphologically rich languages" @default.
- W2105372662 cites W107550075 @default.
- W2105372662 cites W139293362 @default.
- W2105372662 cites W1505680913 @default.
- W2105372662 cites W1508382620 @default.
- W2105372662 cites W1508567213 @default.
- W2105372662 cites W1530250655 @default.
- W2105372662 cites W1536719366 @default.
- W2105372662 cites W1558584194 @default.
- W2105372662 cites W1562769351 @default.
- W2105372662 cites W1570802506 @default.
- W2105372662 cites W1571975558 @default.
- W2105372662 cites W1575798196 @default.
- W2105372662 cites W1583697620 @default.
- W2105372662 cites W1597533204 @default.
- W2105372662 cites W1608322251 @default.
- W2105372662 cites W1614298861 @default.
- W2105372662 cites W162009636 @default.
- W2105372662 cites W1631260214 @default.
- W2105372662 cites W1633328346 @default.
- W2105372662 cites W1662133657 @default.
- W2105372662 cites W1665214252 @default.
- W2105372662 cites W1707124376 @default.
- W2105372662 cites W1719940802 @default.
- W2105372662 cites W1727944201 @default.
- W2105372662 cites W1753482797 @default.
- W2105372662 cites W1763771263 @default.
- W2105372662 cites W1766656764 @default.
- W2105372662 cites W179875071 @default.
- W2105372662 cites W1815076433 @default.
- W2105372662 cites W1836307405 @default.
- W2105372662 cites W1886986916 @default.
- W2105372662 cites W189092062 @default.
- W2105372662 cites W1892363745 @default.
- W2105372662 cites W1916559533 @default.
- W2105372662 cites W1934041838 @default.
- W2105372662 cites W195465510 @default.
- W2105372662 cites W1964209958 @default.
- W2105372662 cites W1969608442 @default.
- W2105372662 cites W1975139803 @default.
- W2105372662 cites W1975638594 @default.
- W2105372662 cites W1978400666 @default.
- W2105372662 cites W1978470410 @default.
- W2105372662 cites W1983606164 @default.
- W2105372662 cites W1984052055 @default.
- W2105372662 cites W1984635093 @default.
- W2105372662 cites W2005902041 @default.
- W2105372662 cites W2012191724 @default.
- W2105372662 cites W2016856586 @default.
- W2105372662 cites W2030392576 @default.
- W2105372662 cites W2040711288 @default.
- W2105372662 cites W2050065334 @default.
- W2105372662 cites W2053218206 @default.
- W2105372662 cites W2053306448 @default.
- W2105372662 cites W2053921957 @default.
- W2105372662 cites W2054533749 @default.
- W2105372662 cites W2056250865 @default.
- W2105372662 cites W2069429561 @default.
- W2105372662 cites W2069712814 @default.
- W2105372662 cites W2072169887 @default.
- W2105372662 cites W2075201173 @default.
- W2105372662 cites W2078124036 @default.
- W2105372662 cites W2080012968 @default.
- W2105372662 cites W2080021477 @default.
- W2105372662 cites W2080100102 @default.
- W2105372662 cites W2082092506 @default.
- W2105372662 cites W2087309226 @default.
- W2105372662 cites W2089385274 @default.
- W2105372662 cites W2089440824 @default.
- W2105372662 cites W2091812280 @default.
- W2105372662 cites W2097835057 @default.
- W2105372662 cites W2100397666 @default.
- W2105372662 cites W2100714283 @default.
- W2105372662 cites W2100976324 @default.
- W2105372662 cites W2101105183 @default.
- W2105372662 cites W2102131037 @default.
- W2105372662 cites W2103078213 @default.
- W2105372662 cites W2103731025 @default.
- W2105372662 cites W2104441213 @default.
- W2105372662 cites W2109664771 @default.
- W2105372662 cites W2111668269 @default.
- W2105372662 cites W2113075298 @default.
- W2105372662 cites W2115979064 @default.
- W2105372662 cites W2116211107 @default.
- W2105372662 cites W2116361377 @default.
- W2105372662 cites W2117126688 @default.
- W2105372662 cites W2117130368 @default.
- W2105372662 cites W2118090838 @default.
- W2105372662 cites W2119825066 @default.
- W2105372662 cites W2120861206 @default.
- W2105372662 cites W2121227244 @default.
- W2105372662 cites W2121924470 @default.
- W2105372662 cites W2122093182 @default.
- W2105372662 cites W2122891480 @default.
- W2105372662 cites W2125573226 @default.