Matches in SemOpenAlex for { <https://semopenalex.org/work/W126065117> ?p ?o ?g. }
Showing items 1 to 86 of
86
with 100 items per page.
- W126065117 abstract "Most existing methods for ontology learning from textual documents rely on natural language analysis. We extend these approaches by taking into account the document structure which bears additional knowledge. The documents that we deal with are XML specifications of databases. In addition to classical linguistic clues, the structural organization of such documents also contributes to convey meaning. In a first stage, we characterize the semantics of XML mark-up and of their relations. Then parsing rules are defined to exploit the XML structure of documents and to create ontology concepts and semantic relations. These rules make it possible to automatically learn a kernel of ontology from documents. In a second stage; this ontology is enriched with the results of text analysis by lexico-syntactic patterns. Both ontology learning rules and patterns are implemented in the Gate platform. INTRODUCTION Ontology learning from text has been investigated from around 2000, with early works like the Terminae (Aussenac-Gilles, Despres and Szulman, 2008) and the Text-to-Onto methods and tools, and several reference books like (Buitelaar, Cimiano and Magnini, 2005). These methods define how to select and combine relevant natural language processing (NLP) tools to find out linguistic clues for ontology items, or, better, to learn and enrich automatically an ontology. High level tasks, like term or relation extraction (Bourigault, 2002), combine several basic text processing. Relation extraction plays a major role to structure the ontology with hierarchical and other kinds of semantic relations, to assign properties to concepts and also to identify concepts. Relation extraction techniques (Grefenstette, 1994) include statistics (looking for repeated segments or meaningful predicate argument structures (Hindle, 1990)), robust or shallow linguistic analyses (mainly pattern matching on syntactically tagged corpora) (Giuliano, Lavelli and Romano, 2006) and learning (to learn new patterns from tagged corpora) (Nedellec and Nazarenko, 2003). A recent state-of-the-art on pattern-based relation extraction from text (Auger and Barriere, 2008) shows that a pattern may correspond to very different characterizations of how a semantic relation may be expressed in a given language and corpora. A pattern defines a way to explore a sequence of words, lemmas, POS, syntactical relations, or semantic classes. These patterns are often defined or checked by manual text browsing, although many linguists tend to use simple and efficient tools like concordancers (Daoust, 1996), KeyWordsInContext like SystemQuick (Ahmad and Holmes-Higgin, 1995), or basic text browsing functions in text editors. A major assumption is that each pattern occurrence should appear within one sentence. But a text is much richer that a list of sentences (Charolles, 1997): its material presentation (Virbel and Luc, 2001), the sentence and paragraph sequencings (the discourse structure) (Asher, Busquet and Vieu, 2001), as well as the context surrounding the reader contribute to the interpretation process. Such features also contribute to relation identification and should be included in pattern definitions. We propose here an approach which takes into account both the material structure of a document and its textual content. In fact, structural tags implemented in a document (section title, subsection title, enumeration, etc.) express hierarchical relations on which we rely to elaborate a first ontology kernel. Furthermore, a text analysis allows enriching this ontology. We test our approach on database specification documents (in the scope of the GEONTO project) where the database structure is reflected by the document structure and constraints on the database content are expressed in natural language. A first evaluation of the tool that implements the method shows some strengths and limitations that draw directions for future works. METHODOLOGY We propose a method for ontology learning that combines two complementary document analyses: the first one bears on the document structure when it is described using languages such as HTML, SGML, XML taking advantage of the semantics of tags and their relations; the second one explores the document textual content by processing natural language. Each process is carried out independently thanks to a specific set of rules that lead to the definition of concepts and relations in an ontology. Rules for parsing XML Document The markup language provides a description of both the text structure and the relationships between the tagged textual units thanks to tree structure of the tags. In the case where tags mark textual units which are short phrases that correspond to linguistic formulations of concepts or relations, semantic relations can be defined thanks to specializations of the following prototypical rule: When A and B are tags, B being covered by A C1 and C2 are concepts respectively labelled by the text marked by A and B Then a semantic relation exists between C1 and C2. Specializing this rule requires human reading and interpretation of the tags and their relations to define a set of extraction rules. Indeed, the semantics conveyed by the tags in the tag tree depends on the context. But once these rules are written for a type of document compliant with an XML schema, they can automatically analyse any valid corpus compliant with this Schema, and provide a core ontology for each document of that type. Rules for Natural Language Processing The body of an XML document corresponds to natural language text and may contain relevant information for enriching the ontology obtained at the end of the previous step. According to Barriere and Agbado (2006), knowledge-rich contexts are text fragments that contain linguistic marks of semantic relation. We choose to use lexico-syntactic patterns to identify semantic relations in these text fragments. A lexico-syntactic pattern describes a regular expression, composed with words, syntactic or semantic categories, and typographic symbols to identify text fragments matching this format. These features are assigned by various NLP tools (tokenizer, parser, tagger, etc.). We defined a set of patterns for three basic semantic relations: hypernymy, meronymy, functional relations. Text analysis with these patterns leads to enrich the ontology kernel with new concepts and relations. EXPERIMENTAL CONTEXT Within the GEONTO project (http://geonto.lri.fr/), one of the partners owns heterogeneous geographical databases and aims at reaching interoperability among them. The GEONTO partners have planned an ontology-based solution: one ontology will be built up for each database and should reflect its content as much as possible; then these ontologies will be mapped to a unique reference ontology. A – Road thoroughfare" @default.
- W126065117 created "2016-06-24" @default.
- W126065117 creator A5002911778 @default.
- W126065117 creator A5023321275 @default.
- W126065117 date "2009-01-01" @default.
- W126065117 modified "2023-09-25" @default.
- W126065117 title "ONTOLOGY LEARNING BY ANALYZING XML DOCUMENT STRUCTURE AND CONTENT" @default.
- W126065117 cites W1480596212 @default.
- W126065117 cites W1493270114 @default.
- W126065117 cites W1556809471 @default.
- W126065117 cites W2006258148 @default.
- W126065117 cites W2123084125 @default.
- W126065117 cites W2163953154 @default.
- W126065117 cites W2223098927 @default.
- W126065117 cites W2340260210 @default.
- W126065117 cites W2395210059 @default.
- W126065117 cites W2476065682 @default.
- W126065117 cites W291570921 @default.
- W126065117 cites W54865250 @default.
- W126065117 cites W87940005 @default.
- W126065117 cites W91519317 @default.
- W126065117 cites W202112495 @default.
- W126065117 doi "https://doi.org/10.5220/0002293301590165" @default.
- W126065117 hasPublicationYear "2009" @default.
- W126065117 type Work @default.
- W126065117 sameAs 126065117 @default.
- W126065117 citedByCount "7" @default.
- W126065117 countsByYear W1260651172015 @default.
- W126065117 countsByYear W1260651172016 @default.
- W126065117 countsByYear W1260651172018 @default.
- W126065117 crossrefType "proceedings-article" @default.
- W126065117 hasAuthorship W126065117A5002911778 @default.
- W126065117 hasAuthorship W126065117A5023321275 @default.
- W126065117 hasConcept C111472728 @default.
- W126065117 hasConcept C11508877 @default.
- W126065117 hasConcept C136764020 @default.
- W126065117 hasConcept C137441365 @default.
- W126065117 hasConcept C138885662 @default.
- W126065117 hasConcept C23123220 @default.
- W126065117 hasConcept C25810664 @default.
- W126065117 hasConcept C34716815 @default.
- W126065117 hasConcept C41008148 @default.
- W126065117 hasConcept C55348073 @default.
- W126065117 hasConcept C68699486 @default.
- W126065117 hasConcept C84314905 @default.
- W126065117 hasConcept C8797682 @default.
- W126065117 hasConceptScore W126065117C111472728 @default.
- W126065117 hasConceptScore W126065117C11508877 @default.
- W126065117 hasConceptScore W126065117C136764020 @default.
- W126065117 hasConceptScore W126065117C137441365 @default.
- W126065117 hasConceptScore W126065117C138885662 @default.
- W126065117 hasConceptScore W126065117C23123220 @default.
- W126065117 hasConceptScore W126065117C25810664 @default.
- W126065117 hasConceptScore W126065117C34716815 @default.
- W126065117 hasConceptScore W126065117C41008148 @default.
- W126065117 hasConceptScore W126065117C55348073 @default.
- W126065117 hasConceptScore W126065117C68699486 @default.
- W126065117 hasConceptScore W126065117C84314905 @default.
- W126065117 hasConceptScore W126065117C8797682 @default.
- W126065117 hasLocation W1260651171 @default.
- W126065117 hasOpenAccess W126065117 @default.
- W126065117 hasPrimaryLocation W1260651171 @default.
- W126065117 hasRelatedWork W1591196636 @default.
- W126065117 hasRelatedWork W1593730672 @default.
- W126065117 hasRelatedWork W1596876523 @default.
- W126065117 hasRelatedWork W181842503 @default.
- W126065117 hasRelatedWork W2027266484 @default.
- W126065117 hasRelatedWork W2052060145 @default.
- W126065117 hasRelatedWork W21497345 @default.
- W126065117 hasRelatedWork W2186891839 @default.
- W126065117 hasRelatedWork W2323688855 @default.
- W126065117 hasRelatedWork W2551371785 @default.
- W126065117 hasRelatedWork W2791640004 @default.
- W126065117 hasRelatedWork W2809981113 @default.
- W126065117 hasRelatedWork W2904198106 @default.
- W126065117 hasRelatedWork W2963454843 @default.
- W126065117 hasRelatedWork W3008867541 @default.
- W126065117 hasRelatedWork W3030113794 @default.
- W126065117 hasRelatedWork W3160234751 @default.
- W126065117 hasRelatedWork W3172782924 @default.
- W126065117 hasRelatedWork W3124908281 @default.
- W126065117 hasRelatedWork W990241657 @default.
- W126065117 isParatext "false" @default.
- W126065117 isRetracted "false" @default.
- W126065117 magId "126065117" @default.
- W126065117 workType "article" @default.