Matches in SemOpenAlex for { <https://semopenalex.org/work/W14854443> ?p ?o ?g. }
- W14854443 abstract "Information Extraction, the task of locating textual mentions of specific types of entities and their relationships, aims at representing the information contained in text documents in a structured format that is more amenable to applications in data mining, question answering, or the semantic web. The goal of our research is to design information extraction models that obtain improved performance by exploiting types of evidence that have not been explored in previous approaches. Since designing an extraction system through introspection by a domain expert is a laborious and time consuming process, the focus of this thesis will be on methods that automatically induce an extraction model by training on a dataset of manually labeled examples. Named Entity Recognition is an information extraction task that is concerned with finding textual mentions of entities that belong to a predefined set of categories. We approach this task as a phrase classification problem, in which candidate phrases from the same document are collectively classified. Global correlations between candidate entities are captured in a model built using the expressive framework of Relational Markov Networks. Additionally, we propose a novel tractable approach to phrase classification for named entity recognition based on a special Junction Tree representation. Classifying entity mentions into a predefined set of categories achieves only a partial disambiguation of the names. This is further refined in the task of Named Entity Disambiguation, where names need to be linked to their actual denotations. In our research, we use Wikipedia as a repository of named entities and propose a ranking approach to disambiguation that exploits learned correlations between words from the name context and categories from the Wikipedia taxonomy. Relation Extraction refers to finding relevant relationships between entities mentioned in text documents. Our approaches to this information extraction task differ in the type and the amount of supervision required. We first propose two relation extraction methods that are trained on documents in which sentences are manually annotated for the required relationships. In the first method, the extraction patterns correspond to sequences of words and word classes anchored at two entity names occurring in the same sentence. These are used as implicit features in a generalized subsequence kernel, with weights computed through training of Support Vector Machines. In the second approach, the implicit extraction features are focused on the shortest path between the two entities in the word-word dependency graph of the sentence. Finally, in a significant departure from previous learning approaches to relation extraction, we propose reducing the amount of required supervision to only a handful of pairs of entities known to exhibit or not exhibit the desired relationship. Each pair is associated with a bag of sentences extracted automatically from a very large corpus. We extend the subsequence kernel to handle this weaker form of supervision, and describe a method for weighting features in order to focus on those correlated with the target relation rather than with the individual entities. The resulting Multiple Instance Learning approach offers a competitive alternative to previous relation extraction methods, at a significantly reduced cost in human supervision." @default.
- W14854443 created "2016-06-24" @default.
- W14854443 creator A5008715111 @default.
- W14854443 creator A5020435927 @default.
- W14854443 date "2007-01-01" @default.
- W14854443 modified "2023-09-26" @default.
- W14854443 title "Learning for information extraction: from named entity recognition and disambiguation to relation extraction" @default.
- W14854443 cites W102233799 @default.
- W14854443 cites W111380827 @default.
- W14854443 cites W1480643256 @default.
- W14854443 cites W1482174963 @default.
- W14854443 cites W1496147749 @default.
- W14854443 cites W1502876877 @default.
- W14854443 cites W1503428008 @default.
- W14854443 cites W1507028917 @default.
- W14854443 cites W1513861746 @default.
- W14854443 cites W1520377376 @default.
- W14854443 cites W1535599202 @default.
- W14854443 cites W1548663377 @default.
- W14854443 cites W1550588214 @default.
- W14854443 cites W1554039773 @default.
- W14854443 cites W1563088657 @default.
- W14854443 cites W1566346388 @default.
- W14854443 cites W1567277581 @default.
- W14854443 cites W1568013626 @default.
- W14854443 cites W1576520375 @default.
- W14854443 cites W1660390307 @default.
- W14854443 cites W1773803948 @default.
- W14854443 cites W1934019294 @default.
- W14854443 cites W1954715867 @default.
- W14854443 cites W1979987551 @default.
- W14854443 cites W1980452149 @default.
- W14854443 cites W1981082061 @default.
- W14854443 cites W1982678692 @default.
- W14854443 cites W1986543644 @default.
- W14854443 cites W1991154713 @default.
- W14854443 cites W1991383860 @default.
- W14854443 cites W1999595522 @default.
- W14854443 cites W2020999234 @default.
- W14854443 cites W2038721957 @default.
- W14854443 cites W2047221353 @default.
- W14854443 cites W2049633694 @default.
- W14854443 cites W2051434435 @default.
- W14854443 cites W2058856481 @default.
- W14854443 cites W2068737686 @default.
- W14854443 cites W2071993998 @default.
- W14854443 cites W2092654472 @default.
- W14854443 cites W2096175520 @default.
- W14854443 cites W2096765155 @default.
- W14854443 cites W2098678088 @default.
- W14854443 cites W2104884878 @default.
- W14854443 cites W2105947498 @default.
- W14854443 cites W2107909509 @default.
- W14854443 cites W2108745803 @default.
- W14854443 cites W2110119381 @default.
- W14854443 cites W2113227740 @default.
- W14854443 cites W2115792525 @default.
- W14854443 cites W2115880858 @default.
- W14854443 cites W2117400858 @default.
- W14854443 cites W2120814856 @default.
- W14854443 cites W2125838338 @default.
- W14854443 cites W2127713198 @default.
- W14854443 cites W2129712609 @default.
- W14854443 cites W2130337399 @default.
- W14854443 cites W2130913205 @default.
- W14854443 cites W2133138256 @default.
- W14854443 cites W2135932125 @default.
- W14854443 cites W2137807925 @default.
- W14854443 cites W2137813581 @default.
- W14854443 cites W2138627627 @default.
- W14854443 cites W2143075689 @default.
- W14854443 cites W2143349571 @default.
- W14854443 cites W2144087279 @default.
- W14854443 cites W2144578941 @default.
- W14854443 cites W2146191280 @default.
- W14854443 cites W2147152072 @default.
- W14854443 cites W2147880316 @default.
- W14854443 cites W2148540243 @default.
- W14854443 cites W2148603752 @default.
- W14854443 cites W2150588363 @default.
- W14854443 cites W2152211274 @default.
- W14854443 cites W2152269015 @default.
- W14854443 cites W2152455533 @default.
- W14854443 cites W2158188757 @default.
- W14854443 cites W2159080219 @default.
- W14854443 cites W2160745555 @default.
- W14854443 cites W2160842254 @default.
- W14854443 cites W2162685317 @default.
- W14854443 cites W2163362093 @default.
- W14854443 cites W2163780445 @default.
- W14854443 cites W2914367381 @default.
- W14854443 cites W2962735828 @default.
- W14854443 cites W55116438 @default.
- W14854443 cites W89857650 @default.
- W14854443 cites W900993354 @default.
- W14854443 hasPublicationYear "2007" @default.
- W14854443 type Work @default.
- W14854443 sameAs 14854443 @default.
- W14854443 citedByCount "11" @default.
- W14854443 countsByYear W148544432012 @default.