Matches in SemOpenAlex for { <https://semopenalex.org/work/W1997783092> ?p ?o ?g. }
- W1997783092 endingPage "30" @default.
- W1997783092 startingPage "1" @default.
- W1997783092 abstract "The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this article: 1) How to create create a simple, language-independent corpus-based stemmer, 2) How to identify sub-words and which types of sub-words are suitable as indexing units, and 3) How to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conflation step and useful in the case of few language-specific resources. For English, the corpus-based stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages." @default.
- W1997783092 created "2016-06-24" @default.
- W1997783092 creator A5060984299 @default.
- W1997783092 creator A5063045515 @default.
- W1997783092 date "2010-09-01" @default.
- W1997783092 modified "2023-10-16" @default.
- W1997783092 title "Sub-Word Indexing and Blind Relevance Feedback for English, Bengali, Hindi, and Marathi IR" @default.
- W1997783092 cites W1517312174 @default.
- W1997783092 cites W1965282483 @default.
- W1997783092 cites W1978022086 @default.
- W1997783092 cites W1979076595 @default.
- W1997783092 cites W1986909372 @default.
- W1997783092 cites W2000635479 @default.
- W1997783092 cites W2002157036 @default.
- W1997783092 cites W2008495066 @default.
- W1997783092 cites W2020647130 @default.
- W1997783092 cites W2028546017 @default.
- W1997783092 cites W2038114184 @default.
- W1997783092 cites W2043909051 @default.
- W1997783092 cites W2050661945 @default.
- W1997783092 cites W2054364203 @default.
- W1997783092 cites W2058200372 @default.
- W1997783092 cites W2065096648 @default.
- W1997783092 cites W2076921108 @default.
- W1997783092 cites W2095277595 @default.
- W1997783092 cites W2098162425 @default.
- W1997783092 cites W2100259670 @default.
- W1997783092 cites W2101711363 @default.
- W1997783092 cites W2105981469 @default.
- W1997783092 cites W2122661071 @default.
- W1997783092 cites W2135922393 @default.
- W1997783092 cites W2138958299 @default.
- W1997783092 cites W2153252192 @default.
- W1997783092 cites W2168965629 @default.
- W1997783092 cites W2186490579 @default.
- W1997783092 cites W2442629089 @default.
- W1997783092 cites W4231856373 @default.
- W1997783092 doi "https://doi.org/10.1145/1838745.1838749" @default.
- W1997783092 hasPublicationYear "2010" @default.
- W1997783092 type Work @default.
- W1997783092 sameAs 1997783092 @default.
- W1997783092 citedByCount "9" @default.
- W1997783092 countsByYear W19977830922012 @default.
- W1997783092 countsByYear W19977830922013 @default.
- W1997783092 countsByYear W19977830922014 @default.
- W1997783092 countsByYear W19977830922015 @default.
- W1997783092 countsByYear W19977830922020 @default.
- W1997783092 countsByYear W19977830922021 @default.
- W1997783092 crossrefType "journal-article" @default.
- W1997783092 hasAuthorship W1997783092A5060984299 @default.
- W1997783092 hasAuthorship W1997783092A5063045515 @default.
- W1997783092 hasBestOaLocation W19977830922 @default.
- W1997783092 hasConcept C115961682 @default.
- W1997783092 hasConcept C120665830 @default.
- W1997783092 hasConcept C121332964 @default.
- W1997783092 hasConcept C137293760 @default.
- W1997783092 hasConcept C138885662 @default.
- W1997783092 hasConcept C141603448 @default.
- W1997783092 hasConcept C154945302 @default.
- W1997783092 hasConcept C158154518 @default.
- W1997783092 hasConcept C1667742 @default.
- W1997783092 hasConcept C17744445 @default.
- W1997783092 hasConcept C192209626 @default.
- W1997783092 hasConcept C19235068 @default.
- W1997783092 hasConcept C199539241 @default.
- W1997783092 hasConcept C204321447 @default.
- W1997783092 hasConcept C23123220 @default.
- W1997783092 hasConcept C2776844415 @default.
- W1997783092 hasConcept C2779532271 @default.
- W1997783092 hasConcept C41008148 @default.
- W1997783092 hasConcept C41895202 @default.
- W1997783092 hasConcept C519982507 @default.
- W1997783092 hasConcept C75165309 @default.
- W1997783092 hasConcept C89686163 @default.
- W1997783092 hasConcept C90805587 @default.
- W1997783092 hasConceptScore W1997783092C115961682 @default.
- W1997783092 hasConceptScore W1997783092C120665830 @default.
- W1997783092 hasConceptScore W1997783092C121332964 @default.
- W1997783092 hasConceptScore W1997783092C137293760 @default.
- W1997783092 hasConceptScore W1997783092C138885662 @default.
- W1997783092 hasConceptScore W1997783092C141603448 @default.
- W1997783092 hasConceptScore W1997783092C154945302 @default.
- W1997783092 hasConceptScore W1997783092C158154518 @default.
- W1997783092 hasConceptScore W1997783092C1667742 @default.
- W1997783092 hasConceptScore W1997783092C17744445 @default.
- W1997783092 hasConceptScore W1997783092C192209626 @default.
- W1997783092 hasConceptScore W1997783092C19235068 @default.
- W1997783092 hasConceptScore W1997783092C199539241 @default.
- W1997783092 hasConceptScore W1997783092C204321447 @default.
- W1997783092 hasConceptScore W1997783092C23123220 @default.
- W1997783092 hasConceptScore W1997783092C2776844415 @default.
- W1997783092 hasConceptScore W1997783092C2779532271 @default.
- W1997783092 hasConceptScore W1997783092C41008148 @default.
- W1997783092 hasConceptScore W1997783092C41895202 @default.
- W1997783092 hasConceptScore W1997783092C519982507 @default.
- W1997783092 hasConceptScore W1997783092C75165309 @default.
- W1997783092 hasConceptScore W1997783092C89686163 @default.
- W1997783092 hasConceptScore W1997783092C90805587 @default.