Matches in SemOpenAlex for { <https://semopenalex.org/work/W2274960229> ?p ?o ?g. }
Showing items 1 to 69 of
69
with 100 items per page.
- W2274960229 abstract "This paper describes a Feature Unification Based Word Grammar model for the morphological parsing of Bangla words. While normal morphological parsing strategy is adequate to decompose a word into morphemes, it is not able directly to compute the part of speech of a derivationally complex word or return a word's inflectional features--precisely the information required for syntactic parsing. These deficiencies have now been remedied by adding a unificationbased word grammar component which can provide parse trees and feature structures. In addition to that, feature unification lessens the number of lexicon classes (less space) and actually reduces the complexities regarding morphotactic analysis. INTRODUCTION Normal morphological parsing strategy decomposes a word into morphemes given lexicon list, proper lexicon order and different spelling change rules. But this is not enough to compute the part of speech of a derivationally complex word or return a word's inflectional features. In this paper we will discuss about feature based morphological parsing for Bangla which gives us parts of speech and other morphological features in addition to the morpheme division. [2][8] At first we give an idea of normal morphological parsing, then we discuss on feature based morphological parsing and in the end we shed light on the comparisons between the two approaches. NORMAL MORPHOLOGICAL PARSER In the normal morphological parser or generator there are actually 3 components: (1) Lexicon (2) Morphotactics (3) Orthographic Rules. [5] 1. Lexicon The list of stems and affixes, together with basic information about them (whether a stem is a Noun stem or a Verb stem, etc.). Every lexicon is of a certain class. Example: Here is an example hAt (হাট) Class: Verb_Stem or Root Feature: Parts of Speech = Verb All the lexicons in a certain class is stored in a FSA (Finite State Automata). 2. Morphotactics The model of morpheme ordering that explains which classes of morphemes can follow other classes of morphemes inside a word. For example, the rule that the Bengali Tense_Person_Affixes follow the Verbs rather than preceding it. Normally morphotactics is implemented using Finite State Automata (FSA). For example the following FSA can be a representation of morphotactic analysis for Bangla: Figure 1: FSA representing morphotactics 3. Orthographic Rules These spelling rules are used to model the changes that occur in a word, usually when two morphemes combine. For example root word hAt (হাট) is changed into hEt (েহট) when added with verb suffix to form a word hEtECI (েহেটিছ): PC_KIMMO version 1 implements this parsing strategy. [1] [12] 1 Through out this paper we have used English alphabet to represent Bangla characters. For example “আ” is “a”, “◌া ” is “A”, “ি◌ ” is “I”, “ক” is “k”, “খ” is “K”, “য” is “y”, “◌ ”(hasanta) is “~” etc. We have also assumed that the words are given in Unicode Format (vowel comes after consonant). For example েখেযিছ is represented as KEyECI. FEATURE BASED MORPHOLOGICAL PARSING This is a morphological parser which uses a unification based chart parser given a proper word grammar. It does so by adding an extra analytical component Word Grammar in addition to the three components described previously in the normal parsing strategy. Just as a sentence parser produces a parse tree with words as its leaf nodes, a word parser produces a parse tree with morphemes as its leaf nodes. When we parse a sentence, it is normally already tokenized into words (since we put white space between words); but when we parse a word, we must first tokenize it into morphemes. This tokenizing is done by the morphotactic and orthographic rules and lexicon. When a surface word is submitted to a Recognizer, the rules and lexicon analyze the word into a sequence of morpheme structures (or possibly more than one sequence if more than one analysis is found). A morpheme structure consists of a lexical form, its gloss, its category, and its features. For example, the word anAdUnIktAr (aনাধিনকতার) is tokenized into this sequence of morpheme structures. Figure 2. Morpheme structure Here cat, next_cat, to_cat, prev_cat all are feature variables and PF (prefix), ADJ (adjective), N_ADJ (both noun and adjective), N(noun), SF (suffix), INF(inflection) are features. The descriptions of the features are as belows: cat: It specifies the category of a lexicon. It can be N, ADJ, V, P, next_cat: It specifies the lexical category of the stems to which it can attach as a prefix. It can be N, ADJ, V, P, prev_cat: It specifies the lexical category of the stems to which it can attach as a suffix. It can be N, ADJ, V, P, This analysis (all the tokens) is then passed to the word grammar which returns the parse tree and feature structure. Word grammar portion actually contains rule list showing how to form a word and all the feature constraints. [8][5] We can use a chart parser to get a parse tree. For every node in the parse tree we have to ensure that no feature constraint is violated. Features of a certain node are actually those features which are derived from the features of the child nodes. So for a node in the parse tree we have to do two things : (1) Feature Unification Figure 3. Feature Unification Feature unification is to see whether the feature constraint specified in the parent node prevails if we have the features from the child nodes. For example in the above picture in the parent node we have to see whether feature F1 of node1 is equal to the feature F2 of node2. If it is not true then this parse tree formation is false. (2) Feature Collection Figure 4. Feature Collection It is to collect features from the child nodes. For example in the above picture in the parent node feature F is equal to the feature F1 of node1. So for the Bangla if we define a word grammar like this [PCKIMMO Version2]: [2][7][13] Word = Stem INFL = //feature unification = //feature collect Stem = Stem_1 SUFFIX = = Stem_1 = PREFIX ADJECTIVE = = Then after the chart parsing and feature unification we get the following parse tree and feature structure: Fig 5: Parse tree and feature structure for anAdUnIktAr(aনাধিনকতার). Here we can see that after the final parsing the top node Word has feature cat=N which specifies that the final word's category is NOUN although its root word adUnIk is actually ADJECTIVE. This is because the SUFFIX tA is added with the ADJECTIVE and changes it into NOUN. This feature constraint specified above in the word grammar is specified once again as belows: Stem = Stem_1 SUFFIX = //unification = //feature collect This states that prev_cat feature of SUFFIX has to be same with the cat feature of Stem_1 and cat feature of Stem is equal to the to_cat feature of SUFFIX. For the word anAdUnIktAr, anAdUnIk is Stem_1 and tA is SUFFIX. And after the normal parsing [as shown in Figure 2] we get the lexicon tA(তা) as" @default.
- W2274960229 created "2016-06-24" @default.
- W2274960229 creator A5021898950 @default.
- W2274960229 creator A5064371188 @default.
- W2274960229 date "2004-01-01" @default.
- W2274960229 modified "2023-09-22" @default.
- W2274960229 title "Feature unification for morphological parsing in Bangla" @default.
- W2274960229 cites W1484602304 @default.
- W2274960229 cites W1579838312 @default.
- W2274960229 cites W201288405 @default.
- W2274960229 cites W2092349690 @default.
- W2274960229 cites W2117985194 @default.
- W2274960229 cites W2151157246 @default.
- W2274960229 cites W2313708685 @default.
- W2274960229 hasPublicationYear "2004" @default.
- W2274960229 type Work @default.
- W2274960229 sameAs 2274960229 @default.
- W2274960229 citedByCount "2" @default.
- W2274960229 countsByYear W22749602292013 @default.
- W2274960229 crossrefType "journal-article" @default.
- W2274960229 hasAuthorship W2274960229A5021898950 @default.
- W2274960229 hasAuthorship W2274960229A5064371188 @default.
- W2274960229 hasConcept C138885662 @default.
- W2274960229 hasConcept C154945302 @default.
- W2274960229 hasConcept C186644900 @default.
- W2274960229 hasConcept C19235068 @default.
- W2274960229 hasConcept C199360897 @default.
- W2274960229 hasConcept C204321447 @default.
- W2274960229 hasConcept C2776401178 @default.
- W2274960229 hasConcept C41008148 @default.
- W2274960229 hasConcept C41895202 @default.
- W2274960229 hasConcept C96146094 @default.
- W2274960229 hasConceptScore W2274960229C138885662 @default.
- W2274960229 hasConceptScore W2274960229C154945302 @default.
- W2274960229 hasConceptScore W2274960229C186644900 @default.
- W2274960229 hasConceptScore W2274960229C19235068 @default.
- W2274960229 hasConceptScore W2274960229C199360897 @default.
- W2274960229 hasConceptScore W2274960229C204321447 @default.
- W2274960229 hasConceptScore W2274960229C2776401178 @default.
- W2274960229 hasConceptScore W2274960229C41008148 @default.
- W2274960229 hasConceptScore W2274960229C41895202 @default.
- W2274960229 hasConceptScore W2274960229C96146094 @default.
- W2274960229 hasLocation W22749602291 @default.
- W2274960229 hasOpenAccess W2274960229 @default.
- W2274960229 hasPrimaryLocation W22749602291 @default.
- W2274960229 hasRelatedWork W1490399361 @default.
- W2274960229 hasRelatedWork W1573527773 @default.
- W2274960229 hasRelatedWork W1575702981 @default.
- W2274960229 hasRelatedWork W1977707485 @default.
- W2274960229 hasRelatedWork W1987957315 @default.
- W2274960229 hasRelatedWork W1988874190 @default.
- W2274960229 hasRelatedWork W2059844773 @default.
- W2274960229 hasRelatedWork W2072010263 @default.
- W2274960229 hasRelatedWork W2077117307 @default.
- W2274960229 hasRelatedWork W2138360330 @default.
- W2274960229 hasRelatedWork W2153287996 @default.
- W2274960229 hasRelatedWork W2605431762 @default.
- W2274960229 hasRelatedWork W2904870873 @default.
- W2274960229 hasRelatedWork W2944016808 @default.
- W2274960229 hasRelatedWork W3037697089 @default.
- W2274960229 hasRelatedWork W89821604 @default.
- W2274960229 hasRelatedWork W1000592743 @default.
- W2274960229 hasRelatedWork W1486711477 @default.
- W2274960229 hasRelatedWork W2839426377 @default.
- W2274960229 hasRelatedWork W2840436729 @default.
- W2274960229 isParatext "false" @default.
- W2274960229 isRetracted "false" @default.
- W2274960229 magId "2274960229" @default.
- W2274960229 workType "article" @default.