Matches in SemOpenAlex for { <https://semopenalex.org/work/W2162198792> ?p ?o ?g. }
- W2162198792 abstract "Information extraction (IE) is the task which aims at automatically extracting specific target information from texts by means of various natural language processing (NLP) and Machine Learning (ML) techniques. The huge amount of available biomedical and clinical texts is an important source of undiscovered knowledge and an interesting domain where IE techniques can be applied. Although there has been a considerable amount of work for IE on other genres of text (such as newspaper articles), results of the state-of-the-art approaches for some of the IE tasks show there is still the need of improvement. Moreover, when these IE approaches are directly applied on biomedical/clinical data, the performance drops considerably. Customization of the IE approaches with biomedical/clinical genre specific features and pre/post-processing techniques does improve the results (with respect to applying the approaches directly) but the situation is still not completely satisfactory. There are many ways to accomplish this goal (e.g. exploitation of scope of negations, discourse structure, semantic roles, etc) which are yet to be fully harnessed for the improvement of IE systems. Additional challenges come from the usage of machine learning (ML) techniques themselves. Imbalance in data distribution is quite common in many NLP (including IE) tasks. Previous studies have empirically shown that unbalanced datasets lead to poor performance for the minority class.In this PhD research, we aim to address the open issues outlined above. We focus on three core IE tasks which are crucial for text mining: named entity recognition (NER), coreference resolution (CoRef), and relation extraction (RE).For NER, we propose an approach for the recognition of disease entity mentions which achieves state-of-the-art performance and is later exploited as a component in our RE system. Our NER system achieves results on par with the state of the art also for other bio-entity types such as genes/proteins, species and drugs. Since the creation of manually annotated training data is a costly process, we also investigate the practical usability of automatically annotated corpora for NER and propose how to automatically improve the quality of such corpora.CoRef, which is naturally the next step after NER, is often deemed as one of the stumbling blocs for other IE tasks such as RE. We propose a greedy and constrained CoRef approach that achieves high results in clinical texts for each individual entity mention type and for each of the four different evaluation metrics usually computed for assessing systems' performance.As for RE, one of the fundamental characteristics of our approach is that we propose to exploit other NLP areas such as scope of negations, elementary discourse units and semantic roles. We propose a novel hybrid kernel that not only takes advantage of different types of information (syntactic, semantic, contextual, etc) but also of the different ways they can be represented (i.e. flat structure, tree, graph). Our approach yields significantly better results than the previous state-of-the-art approaches for drug-drug interaction and protein-protein interaction extraction tasks.In each of the above tasks, we concentrate to develop pro-active IE approaches to automatically get rid of unnecessary training/test instances even before training ML models and using those models on test data. This enables better performance because of the reduction of less skewed data distribution as well as faster runtime.We tested our NER and RE approaches on other genres of text such as newspaper articles and automatically transcribed broadcast news. The results show that our approaches are largely domain independent." @default.
- W2162198792 created "2016-06-24" @default.
- W2162198792 creator A5073915390 @default.
- W2162198792 date "2013-04-10" @default.
- W2162198792 modified "2023-09-26" @default.
- W2162198792 title "Improving the Effectiveness of Information Extraction from Biomedical Text" @default.
- W2162198792 cites W107258648 @default.
- W2162198792 cites W110692952 @default.
- W2162198792 cites W113629972 @default.
- W2162198792 cites W122290181 @default.
- W2162198792 cites W127334369 @default.
- W2162198792 cites W131482884 @default.
- W2162198792 cites W137105571 @default.
- W2162198792 cites W14854443 @default.
- W2162198792 cites W1491611863 @default.
- W2162198792 cites W1493270114 @default.
- W2162198792 cites W1493490255 @default.
- W2162198792 cites W1495981708 @default.
- W2162198792 cites W1497603085 @default.
- W2162198792 cites W1502412648 @default.
- W2162198792 cites W1510073064 @default.
- W2162198792 cites W1529842856 @default.
- W2162198792 cites W154351976 @default.
- W2162198792 cites W1550258693 @default.
- W2162198792 cites W1552841624 @default.
- W2162198792 cites W1566346388 @default.
- W2162198792 cites W1574715085 @default.
- W2162198792 cites W1574862351 @default.
- W2162198792 cites W1574989980 @default.
- W2162198792 cites W1578835049 @default.
- W2162198792 cites W1579429723 @default.
- W2162198792 cites W1598003989 @default.
- W2162198792 cites W1672757658 @default.
- W2162198792 cites W168632859 @default.
- W2162198792 cites W1715818909 @default.
- W2162198792 cites W1792818653 @default.
- W2162198792 cites W1835371444 @default.
- W2162198792 cites W183893648 @default.
- W2162198792 cites W1850865022 @default.
- W2162198792 cites W1885010341 @default.
- W2162198792 cites W1919152067 @default.
- W2162198792 cites W1931477211 @default.
- W2162198792 cites W1954622760 @default.
- W2162198792 cites W1986704581 @default.
- W2162198792 cites W1987170279 @default.
- W2162198792 cites W1988432728 @default.
- W2162198792 cites W1990794790 @default.
- W2162198792 cites W1991154713 @default.
- W2162198792 cites W1996787131 @default.
- W2162198792 cites W2011726136 @default.
- W2162198792 cites W2012012028 @default.
- W2162198792 cites W2012688074 @default.
- W2162198792 cites W2017231698 @default.
- W2162198792 cites W2020278455 @default.
- W2162198792 cites W2022166150 @default.
- W2162198792 cites W2026257418 @default.
- W2162198792 cites W2032566933 @default.
- W2162198792 cites W2036935277 @default.
- W2162198792 cites W2037987185 @default.
- W2162198792 cites W2038721957 @default.
- W2162198792 cites W2040884411 @default.
- W2162198792 cites W2042972234 @default.
- W2162198792 cites W2044420612 @default.
- W2162198792 cites W2045016337 @default.
- W2162198792 cites W2046747418 @default.
- W2162198792 cites W2048059249 @default.
- W2162198792 cites W2048140075 @default.
- W2162198792 cites W2048679005 @default.
- W2162198792 cites W2049645944 @default.
- W2162198792 cites W2053238041 @default.
- W2162198792 cites W2053724458 @default.
- W2162198792 cites W2056616115 @default.
- W2162198792 cites W2067982155 @default.
- W2162198792 cites W2072823612 @default.
- W2162198792 cites W2075472325 @default.
- W2162198792 cites W2076127082 @default.
- W2162198792 cites W2078017455 @default.
- W2162198792 cites W2092481996 @default.
- W2162198792 cites W2094728533 @default.
- W2162198792 cites W2096814387 @default.
- W2162198792 cites W2097606805 @default.
- W2162198792 cites W2097960255 @default.
- W2162198792 cites W2099369363 @default.
- W2162198792 cites W2102708424 @default.
- W2162198792 cites W2106419350 @default.
- W2162198792 cites W2107005506 @default.
- W2162198792 cites W2107598941 @default.
- W2162198792 cites W2107658650 @default.
- W2162198792 cites W2108211831 @default.
- W2162198792 cites W2110119381 @default.
- W2162198792 cites W2110120974 @default.
- W2162198792 cites W2110279753 @default.
- W2162198792 cites W2110871096 @default.
- W2162198792 cites W2113234275 @default.
- W2162198792 cites W2114388055 @default.
- W2162198792 cites W2116159459 @default.
- W2162198792 cites W2116786260 @default.
- W2162198792 cites W2117770626 @default.
- W2162198792 cites W2120814856 @default.
- W2162198792 cites W2121802207 @default.