Matches in SemOpenAlex for { <https://semopenalex.org/work/W622962318> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W622962318 abstract "Cross Language Information Retrieval (CLIR) deals with the problem of retrieving documents written in a language different from that of the query. The popularization of the Internet has enabled access to less known languages such as Arabic, and made the CLIR with Arabic language increasingly in demand. However, a language such as Arabic, which is morphologically very rich and complex, presents many challenges for Information Retrieval (IR) and natural language processing in general. This thesis investigates the problems of Arabic monolingual information retrieval and English-Arabic CLIR.In Arabic monolingual IR, one of the main challenges is the morphological processing which aims to determine an appropriate form of index from words. In this thesis, we try to identify the best stemming technique for Arabic words. To this end, we propose a new approach which tries to determine the core of a word according to linguistic rules and corpus statistics. This approach is compared to the traditional approach, which operates some light truncations on both extremities of a word. Both methods are tested and compared on a large test collection. The results show that the proposed new method leads to a higher effectiveness than the traditional method.In CLIR, the major challenge is the translation of the query to the document language. Full text machine translation is not fully adapted to query translation, since queries are rarely sentences and more often just a sequence of words without syntactic structure. In this context, the use of bilingual dictionaries and parallel corpora become interesting alternatives. This is especially the case given the fact that query translation aims to suggest good terms to retrieve documents and not to produce a human readable sentence. However, the resources in Arabic are limited. There are machine-readable dictionaries, but there are few parallel corpora such as Hansard. So, we first exploited the Web to automatically build a corpus of English-Arabic parallel Web pages. From this corpus, a statistical translation model is trained for English-Arabic CLIR. In a context where the resources are limited, it is often advantageous to combine several available resources for the translation. Indeed, the combination of several translation resources would allow to improve the coverage of a resource for query terms, and to benefit from the query expansion effect, which is strongly desired in IR. In this thesis, two techniques of combination are studied. The first method is traditional: it makes a linear combination of resources, which groups translations suggested by different resources for the same word by assigning a global confidence to each resource. The second method uses confidence factors associated with each translation. This new method of combination of resources reconsiders all the translation candidates proposed by the different resources and, by introducing additional features, it re-evaluates them to determine a new score. These two methods have been tested on two English-Arabic CLIR collections and the results show that the method using confidence factors performs better than the traditional method. This thesis made two contributions. On the one hand, it proposes a new method for stemming Arabic words, better suited for IR. On the other hand, it proposes a new method for the combination of translation resources using confidence factors. To our knowledge, it is the first time that confidence factors are used in the context of CLIR.Keywords. Arabic information retrieval, Cross-language information retrieval, stemming, query translation, combination of translation resources, confidence factors." @default.
- W622962318 created "2016-06-24" @default.
- W622962318 creator A5027720258 @default.
- W622962318 date "2008-01-01" @default.
- W622962318 modified "2023-10-01" @default.
- W622962318 title "Recherche d'information translinguistique sur les documents en arabe" @default.
- W622962318 cites W1489181569 @default.
- W622962318 cites W1499109705 @default.
- W622962318 cites W1503868078 @default.
- W622962318 cites W1506674094 @default.
- W622962318 cites W1532325895 @default.
- W622962318 cites W1537000426 @default.
- W622962318 cites W1543107604 @default.
- W622962318 cites W1546195864 @default.
- W622962318 cites W1553682320 @default.
- W622962318 cites W1554663460 @default.
- W622962318 cites W1559783642 @default.
- W622962318 cites W1569415500 @default.
- W622962318 cites W1594864214 @default.
- W622962318 cites W1718341272 @default.
- W622962318 cites W1956559956 @default.
- W622962318 cites W1971220772 @default.
- W622962318 cites W1971285215 @default.
- W622962318 cites W1973923101 @default.
- W622962318 cites W1997841190 @default.
- W622962318 cites W2006969979 @default.
- W622962318 cites W2033937535 @default.
- W622962318 cites W2043909051 @default.
- W622962318 cites W2045137302 @default.
- W622962318 cites W2046456023 @default.
- W622962318 cites W2047959359 @default.
- W622962318 cites W2068905009 @default.
- W622962318 cites W2091014931 @default.
- W622962318 cites W2093390569 @default.
- W622962318 cites W2097802284 @default.
- W622962318 cites W2098162425 @default.
- W622962318 cites W2106638140 @default.
- W622962318 cites W2109933326 @default.
- W622962318 cites W2126815469 @default.
- W622962318 cites W2129264959 @default.
- W622962318 cites W2136542423 @default.
- W622962318 cites W2140354722 @default.
- W622962318 cites W2142756035 @default.
- W622962318 cites W2144746247 @default.
- W622962318 cites W2165612380 @default.
- W622962318 cites W2166968190 @default.
- W622962318 cites W2170694014 @default.
- W622962318 cites W2338181011 @default.
- W622962318 cites W2400327768 @default.
- W622962318 cites W2403310277 @default.
- W622962318 cites W2726585279 @default.
- W622962318 cites W48642672 @default.
- W622962318 cites W84280981 @default.
- W622962318 cites W88705112 @default.
- W622962318 cites W3143980388 @default.
- W622962318 hasPublicationYear "2008" @default.
- W622962318 type Work @default.
- W622962318 sameAs 622962318 @default.
- W622962318 citedByCount "2" @default.
- W622962318 countsByYear W6229623182014 @default.
- W622962318 countsByYear W6229623182018 @default.
- W622962318 crossrefType "dissertation" @default.
- W622962318 hasAuthorship W622962318A5027720258 @default.
- W622962318 hasConcept C138885662 @default.
- W622962318 hasConcept C151730666 @default.
- W622962318 hasConcept C154945302 @default.
- W622962318 hasConcept C203005215 @default.
- W622962318 hasConcept C204321447 @default.
- W622962318 hasConcept C23123220 @default.
- W622962318 hasConcept C2778842860 @default.
- W622962318 hasConcept C2779343474 @default.
- W622962318 hasConcept C41008148 @default.
- W622962318 hasConcept C41895202 @default.
- W622962318 hasConcept C86803240 @default.
- W622962318 hasConcept C90805587 @default.
- W622962318 hasConcept C96455323 @default.
- W622962318 hasConceptScore W622962318C138885662 @default.
- W622962318 hasConceptScore W622962318C151730666 @default.
- W622962318 hasConceptScore W622962318C154945302 @default.
- W622962318 hasConceptScore W622962318C203005215 @default.
- W622962318 hasConceptScore W622962318C204321447 @default.
- W622962318 hasConceptScore W622962318C23123220 @default.
- W622962318 hasConceptScore W622962318C2778842860 @default.
- W622962318 hasConceptScore W622962318C2779343474 @default.
- W622962318 hasConceptScore W622962318C41008148 @default.
- W622962318 hasConceptScore W622962318C41895202 @default.
- W622962318 hasConceptScore W622962318C86803240 @default.
- W622962318 hasConceptScore W622962318C90805587 @default.
- W622962318 hasConceptScore W622962318C96455323 @default.
- W622962318 hasLocation W6229623181 @default.
- W622962318 hasOpenAccess W622962318 @default.
- W622962318 hasPrimaryLocation W6229623181 @default.
- W622962318 isParatext "false" @default.
- W622962318 isRetracted "false" @default.
- W622962318 magId "622962318" @default.
- W622962318 workType "dissertation" @default.