Matches in SemOpenAlex for { <https://semopenalex.org/work/W1502365051> ?p ?o ?g. }
- W1502365051 abstract "This thesis is focused on the study and use of Causal State Splitting Reconstruction (CSSR) algorithm for Natural Language Processing (NLP) tasks, CSSR is an algorithm that captures patterns from data building automata in the form of visible Markov Models. It is based on the principles of Computational Mechanics and takes advantage of many properties of causal state theory. One of the main advantages of CSSR with respect to Markov Models is that it builds states containing more than one $n$gram (called history in computational mechanics), so the obtained automata are much smaller than the equivalent Markov Model.In this work, we first study the behavior of the algorithm when learning patterns related to NLP tasks but without performing any annotation task. This first experiments are useful to understand the parameters that affect the algorithm and to check that it is able to capture the patterns present in natural language sentences. Secondly, we propose a way to apply CSSR to NLP annotation tasks. The algorithm is not originally conceived to use the hidden information necessary for annotation tasks, so we devised a way to introduce it into the system in order to obtain automata including this information that can be afterwards used to annotate new text. Also, some methods to deal with unseen events and a modification of the algorithm to make it more suitable for NLP tasks have been presented and tested. These three aspects conform the first line of contributions of this research, altogether with a deep experimental study of the proposed methods. The experimental study of the proposed approach is performed in three different tasks: Named Entity Recognition in general and Biomedical domain and Chunking. The obtained results are promising in the two first tasks though not so good for Chunking. Nevertheless, it is not easy to improve the obtained performance following the same approach, since CSSR needs quite reduced feature sets to build correct automaton and that limits the performance of the developed system. For that reason, we propose to combine CSSR with graphical models, in order to enrich the features that the system can take into account.This combination conforms the second line of contributions of this thesis. There is a variety of possible graphical models that can be used, but for the moment we propose to combine CSSR algorithm with Maximum Entropy (ME) models. ME models can be used as a way of introducing more information into the system, encoding it as features. In this line, we propose and test two methods for combining CSSR and ME models in order to improve the results obtained with original CSSR. The first method is simple and does not modify the automatabuilding algorithm while the second one is more sophisticated and builds automata taking into account the ME features. We will see that though much more simpler, the first method leads to an important improvement with respect to original CSSR but the second method does not." @default.
- W1502365051 created "2016-06-24" @default.
- W1502365051 creator A5006867871 @default.
- W1502365051 date "2008-01-01" @default.
- W1502365051 modified "2023-09-23" @default.
- W1502365051 title "Applying causal-state splitting reconstruction algorithm to natural language processing tasks" @default.
- W1502365051 cites W102026425 @default.
- W1502365051 cites W117253061 @default.
- W1502365051 cites W123309796 @default.
- W1502365051 cites W12821309 @default.
- W1502365051 cites W14030242 @default.
- W1502365051 cites W147167782 @default.
- W1502365051 cites W1491547392 @default.
- W1502365051 cites W1505083828 @default.
- W1502365051 cites W1524281572 @default.
- W1502365051 cites W1527478795 @default.
- W1502365051 cites W1529355025 @default.
- W1502365051 cites W1530960090 @default.
- W1502365051 cites W1538382016 @default.
- W1502365051 cites W1538639190 @default.
- W1502365051 cites W1540449438 @default.
- W1502365051 cites W1541018942 @default.
- W1502365051 cites W1542537254 @default.
- W1502365051 cites W1543515964 @default.
- W1502365051 cites W1544752668 @default.
- W1502365051 cites W1568620938 @default.
- W1502365051 cites W1570690983 @default.
- W1502365051 cites W1574901103 @default.
- W1502365051 cites W1576523092 @default.
- W1502365051 cites W1583380718 @default.
- W1502365051 cites W1593927401 @default.
- W1502365051 cites W1602822984 @default.
- W1502365051 cites W1606142945 @default.
- W1502365051 cites W1608908834 @default.
- W1502365051 cites W1612238087 @default.
- W1502365051 cites W1623072288 @default.
- W1502365051 cites W184397588 @default.
- W1502365051 cites W1847996525 @default.
- W1502365051 cites W185041730 @default.
- W1502365051 cites W189110383 @default.
- W1502365051 cites W1912162275 @default.
- W1502365051 cites W1934019294 @default.
- W1502365051 cites W1961810468 @default.
- W1502365051 cites W1966853641 @default.
- W1502365051 cites W1968801347 @default.
- W1502365051 cites W1972407269 @default.
- W1502365051 cites W197270748 @default.
- W1502365051 cites W1972853378 @default.
- W1502365051 cites W1973702599 @default.
- W1502365051 cites W1976339715 @default.
- W1502365051 cites W1976642235 @default.
- W1502365051 cites W1977896902 @default.
- W1502365051 cites W1978470410 @default.
- W1502365051 cites W1979559232 @default.
- W1502365051 cites W1981320868 @default.
- W1502365051 cites W1982982698 @default.
- W1502365051 cites W1983073929 @default.
- W1502365051 cites W1983913270 @default.
- W1502365051 cites W1988995507 @default.
- W1502365051 cites W1990345192 @default.
- W1502365051 cites W1994330161 @default.
- W1502365051 cites W1995249715 @default.
- W1502365051 cites W1997090104 @default.
- W1502365051 cites W1999987298 @default.
- W1502365051 cites W2000183479 @default.
- W1502365051 cites W2001616149 @default.
- W1502365051 cites W2001792610 @default.
- W1502365051 cites W2003499201 @default.
- W1502365051 cites W2004111828 @default.
- W1502365051 cites W2004384146 @default.
- W1502365051 cites W2006960009 @default.
- W1502365051 cites W2008830554 @default.
- W1502365051 cites W2010665722 @default.
- W1502365051 cites W2011485152 @default.
- W1502365051 cites W2013462170 @default.
- W1502365051 cites W2017603160 @default.
- W1502365051 cites W2020278455 @default.
- W1502365051 cites W2021331223 @default.
- W1502365051 cites W2028122758 @default.
- W1502365051 cites W2031203571 @default.
- W1502365051 cites W2032558547 @default.
- W1502365051 cites W2036453285 @default.
- W1502365051 cites W2038705330 @default.
- W1502365051 cites W2040542896 @default.
- W1502365051 cites W2041614298 @default.
- W1502365051 cites W2042188227 @default.
- W1502365051 cites W2042606459 @default.
- W1502365051 cites W2045993505 @default.
- W1502365051 cites W2047706513 @default.
- W1502365051 cites W2047782770 @default.
- W1502365051 cites W2050331639 @default.
- W1502365051 cites W2056451646 @default.
- W1502365051 cites W2057217674 @default.
- W1502365051 cites W2057627381 @default.
- W1502365051 cites W2061271742 @default.
- W1502365051 cites W2063307689 @default.
- W1502365051 cites W2065143466 @default.
- W1502365051 cites W2066539191 @default.
- W1502365051 cites W2066560058 @default.
- W1502365051 cites W2068882115 @default.