Matches in SemOpenAlex for { <https://semopenalex.org/work/W2151818087> ?p ?o ?g. }
Showing items 1 to 93 of
93
with 100 items per page.
- W2151818087 endingPage "74" @default.
- W2151818087 startingPage "47" @default.
- W2151818087 abstract "In this chapter we describe a computational grammar for Basque, and the first results obtained using it in the process of automatically acquiring subcategorization information about verbs and their associated sentence elements (arguments and adjuncts). The first part of this chapter (section 1) will be devoted to the description of Basque syntax, and to present the grammar we have developed. The grammar is partial in the sense that it cannot recognize every sentence in real texts, but it is capable of describing the main syntactic elements, such as noun-phrases (NPs), prepositional phrases (PPs), and subordinate and simple sentences. This can be useful for several applications. Next, the syntactic grammar will be used by a syntactic analyzer (or parser) to automatically acquire information on verbal subcategorization from texts (section 2). The results will later be used by a linguist or processed by statistical filters. This work has been done by the IXA Natural Language Processing research group, centered on the application of automatic methods to the analysis of Basque. Comparing to other languages (English, German, French, ...) Basque can be considered as a minority language due to the following constraints: • Limited number of language users. This fact implies a reduced number of researchers/developers of computational linguistic tools. • Limited number of language resources, in the form of computational lexicons, grammars, corpora, annotated treebanks or dictionaries. These are the main reasons that have compelled the IXA group to the development of automatic methods for the analysis of linguistic data. The work described in this chapter is a part of this effort. 1 THE SYNTACTIC ANALYZER 1.1 A BRIEF INTRODUCTION TO COMPUTATIONAL SYNTAX The computational treatment of syntax has long been an area of research. From 1950, when the first automatic translation systems were created, many researchers have studied the syntactic relationships among words and the way they are combined to form sentences. However, the task was more difficult than expected. Nowadays, there is no system capable of syntactically analyzing any sentence in real texts, such as newspapers. At the moment, the best syntactic analyzers have been developed for English, but they find an unsolvable obstacle in the form of ambiguity, because many common sentences can produce tens or even hundreds of different syntactic analyses. In this context, we can distinguish two approaches to computational syntax, according to their main objective: • Full parsing. The aim is to construct more accurate and complete grammars and parsers, with the objective of syntactically analyzing any sentence. As we have noted earlier, the state of the art is still far from this objective. • Partial parsing. In many systems the objective is not to completely analyze a sentence, but to detect several syntactic elements, such as NPs, verb chains or simple sentences. These pieces of information, also called FKXQNV (Abney 1997), are useful for several linguistic applications, as information retrieval or speech synthesis. Regarding the main kind of knowledge employed, we can classify syntactic analyzers in four groups: • Unification-based analyzers (Shieber 1986). These systems are based on context-free grammars (Chomsky 1957) with the addition of information to syntactic elements and rules by means of feature structures (see subsection 1.2). • Finite state analyzers (Karttunen et al. 1997). They are mainly dedicated to partial parsing, that is, they typically distinguish the different components of a sentence. Grammars are defined using regular expressions. • Constraint grammar (Karlsson 1995). To analyze a sentence, this formalism begins with all the options to analyze each individual word-form, and the task of the grammar is to discard as many options as possible until each word contains a single analysis that gives information about number, case, person and syntactic category. This formalism is called reductionistic because it starts from all the possibilities and it ends only when the correct one is selected. • Statistical methods. These systems automatically acquire syntactic information (in the form of context-free grammars or regular expressions) from big corpora. The information thus obtained is used to analyze new sentences. Usually, statistical methods are not used in isolation, but combined with other methods (Collins 1997). The IXA natural language processing group has developed two syntactic analyzers for Basque, one using a unification-based formalism and another one based on a Constraint Grammar. Work on this second formalism is described in (Aduriz et al. 1997; Arriola 2000; Aduriz 2000; Aduriz and Arriola 2001). In this chapter we will describe a unification grammar for Basque together with its application to the task of automatically extracting verbal information from text corpora. Regarding computational grammars and syntactic analyzers for languages other than Basque we can cite the following: • Natural Language Software Registry: http://registry.dfki.de • Computational Linguistics (on-line presentations): http://www.ifi.unizh.ch/CL/InteractiveTools.html#as-h2-3296 Or else, if we want to experiment directly with a syntactic analyzer: • Syntactic analyzer for English: http://www.conexor.fi • Syntactic analyzer for Spanish (CliC): http://clic.fil.ub.es/equipo/index_en.shtml 1.2 UNIFICATION-BASED GRAMMAR FORMALISMS AND PATR Unification-based grammar formalisms are based on context-free grammars (CFG). CFGs were formalized by Chomsky (1957), and they define a grammar as shown in Table 1. (QJOLVK JUDPPDU %DVTXH JUDPPDU S J NP VP VP J Verb NP NP J Noun NP J Det Noun S J NP VP VP J NP Verb NP J Noun NP J Pronoun Table 1. Two examples of context-free grammars. Context-free rules are of the form ‘a J b’ or ‘a J b c’, where D is a non-terminal syntactic category and E, F are terminals (lexical elements) or non-terminals. Non-terminal symbols (S, NP, PP, ...) are syntactic categories, while terminals are words or morphemes from a lexicon. The chains of terminal symbols that can be derived from the first symbol (or axiom) of the grammar (6 or sentence in the example) will be the sentences of the language. A sentence belonging to the grammar will be typically described by a tree. For example, Figure 1 shows an analysis tree of a sentence derived using the rules for the Basque grammar in Table 1." @default.
- W2151818087 created "2016-06-24" @default.
- W2151818087 creator A5009281472 @default.
- W2151818087 creator A5030328561 @default.
- W2151818087 creator A5085496182 @default.
- W2151818087 date "2004-01-01" @default.
- W2151818087 modified "2023-10-16" @default.
- W2151818087 title "PATRIXA: A UNIFICATION-BASED PARSER FOR BASQUE AND ITS APPLICATION TO THE AUTOMATIC ANALYSIS OF VERBS" @default.
- W2151818087 cites W143600764 @default.
- W2151818087 cites W1499641710 @default.
- W2151818087 cites W1527398610 @default.
- W2151818087 cites W1572747509 @default.
- W2151818087 cites W1622422412 @default.
- W2151818087 cites W1809415035 @default.
- W2151818087 cites W1972573551 @default.
- W2151818087 cites W201288405 @default.
- W2151818087 cites W2032527312 @default.
- W2151818087 cites W2038248725 @default.
- W2151818087 cites W2088198454 @default.
- W2151818087 cites W2096369514 @default.
- W2151818087 cites W2108455276 @default.
- W2151818087 cites W2120006020 @default.
- W2151818087 cites W2151157246 @default.
- W2151818087 cites W2163791257 @default.
- W2151818087 cites W2295375623 @default.
- W2151818087 cites W1589269291 @default.
- W2151818087 hasPublicationYear "2004" @default.
- W2151818087 type Work @default.
- W2151818087 sameAs 2151818087 @default.
- W2151818087 citedByCount "0" @default.
- W2151818087 crossrefType "journal-article" @default.
- W2151818087 hasAuthorship W2151818087A5009281472 @default.
- W2151818087 hasAuthorship W2151818087A5030328561 @default.
- W2151818087 hasAuthorship W2151818087A5085496182 @default.
- W2151818087 hasConcept C121934690 @default.
- W2151818087 hasConcept C138885662 @default.
- W2151818087 hasConcept C153962237 @default.
- W2151818087 hasConcept C154775046 @default.
- W2151818087 hasConcept C154945302 @default.
- W2151818087 hasConcept C155092808 @default.
- W2151818087 hasConcept C186644900 @default.
- W2151818087 hasConcept C204321447 @default.
- W2151818087 hasConcept C26022165 @default.
- W2151818087 hasConcept C2776397901 @default.
- W2151818087 hasConcept C2777530160 @default.
- W2151818087 hasConcept C41008148 @default.
- W2151818087 hasConcept C41895202 @default.
- W2151818087 hasConcept C60048249 @default.
- W2151818087 hasConcept C70845037 @default.
- W2151818087 hasConceptScore W2151818087C121934690 @default.
- W2151818087 hasConceptScore W2151818087C138885662 @default.
- W2151818087 hasConceptScore W2151818087C153962237 @default.
- W2151818087 hasConceptScore W2151818087C154775046 @default.
- W2151818087 hasConceptScore W2151818087C154945302 @default.
- W2151818087 hasConceptScore W2151818087C155092808 @default.
- W2151818087 hasConceptScore W2151818087C186644900 @default.
- W2151818087 hasConceptScore W2151818087C204321447 @default.
- W2151818087 hasConceptScore W2151818087C26022165 @default.
- W2151818087 hasConceptScore W2151818087C2776397901 @default.
- W2151818087 hasConceptScore W2151818087C2777530160 @default.
- W2151818087 hasConceptScore W2151818087C41008148 @default.
- W2151818087 hasConceptScore W2151818087C41895202 @default.
- W2151818087 hasConceptScore W2151818087C60048249 @default.
- W2151818087 hasConceptScore W2151818087C70845037 @default.
- W2151818087 hasLocation W21518180871 @default.
- W2151818087 hasOpenAccess W2151818087 @default.
- W2151818087 hasPrimaryLocation W21518180871 @default.
- W2151818087 hasRelatedWork W1490782328 @default.
- W2151818087 hasRelatedWork W1510636957 @default.
- W2151818087 hasRelatedWork W1608177823 @default.
- W2151818087 hasRelatedWork W1972372712 @default.
- W2151818087 hasRelatedWork W2007805332 @default.
- W2151818087 hasRelatedWork W2099721505 @default.
- W2151818087 hasRelatedWork W2170008788 @default.
- W2151818087 hasRelatedWork W2186287356 @default.
- W2151818087 hasRelatedWork W2250177462 @default.
- W2151818087 hasRelatedWork W2340909594 @default.
- W2151818087 hasRelatedWork W2377969832 @default.
- W2151818087 hasRelatedWork W2400287326 @default.
- W2151818087 hasRelatedWork W2708709431 @default.
- W2151818087 hasRelatedWork W2790702445 @default.
- W2151818087 hasRelatedWork W2905924055 @default.
- W2151818087 hasRelatedWork W2954581892 @default.
- W2151818087 hasRelatedWork W2994194174 @default.
- W2151818087 hasRelatedWork W3006624250 @default.
- W2151818087 hasRelatedWork W3142172934 @default.
- W2151818087 hasRelatedWork W1575822163 @default.
- W2151818087 isParatext "false" @default.
- W2151818087 isRetracted "false" @default.
- W2151818087 magId "2151818087" @default.
- W2151818087 workType "article" @default.