Matches in SemOpenAlex for { <https://semopenalex.org/work/W2111580947> ?p ?o ?g. }
Showing items 1 to 87 of
87
with 100 items per page.
- W2111580947 abstract "IR group of Tsinghua University this year has used its TMiner text retrieval system for indexing and retrieval of the Terabyte track ad hoc and named-page subtasks. In doing the two tasks, we used the in-link anchor texts (the anchor of the URLs that point to the current page in the collection) together with the content texts of the web pages for building the indices. When retrieving, the word-pair method (1) was used and proved effective on 2004 and 2005 Terabyte ad hoc task topics and the 2005 named-page task. We provide further analysis of the performance of word-pair method in comparison with the Markov random field term dependence model of (2) and another generative phrase model we proposed, which is more natural on the language modeling framework (3). 1. TMiner at Terabyte 2005 On a PC of 2GB memory, with one CPU and IDE hard disks, TMiner could index 50GB text (about 200GB HTML files) with tolerable time. But since the terabyte collection contains about 100GB pure text (110GB including anchor texts), building one single index for such a large collection would cost TMiner too much time. We built 27 indices for the 27 parts of the collection in our experiments. When retrieving, we summed the DF values of the query terms from each index, and assigned the BM2500 RSV to documents in the collection according to the DF sum. This distributed index system returns exact RSV as if only one single index is constructed for the whole collection (at the expense of additional query processing time). In the ad hoc and named-page tasks, the index of in-link anchor combined with page content was used. This is the most effective way of combining anchor text for retrieval in our observation and we didn't build indices that contain no in-link anchor for comparison. In addition to the use of anchor text, since the indices we built contains full position information for the index terms, the word-pair method (1) was used in both tasks." @default.
- W2111580947 created "2016-06-24" @default.
- W2111580947 creator A5000699907 @default.
- W2111580947 creator A5040969707 @default.
- W2111580947 creator A5089791620 @default.
- W2111580947 date "2005-01-01" @default.
- W2111580947 modified "2023-09-24" @default.
- W2111580947 title "THUIR at TREC 2005 Terabyte Track." @default.
- W2111580947 cites W2067802667 @default.
- W2111580947 cites W2070740689 @default.
- W2111580947 cites W2093390569 @default.
- W2111580947 cites W2136542423 @default.
- W2111580947 cites W2136583886 @default.
- W2111580947 cites W2167487145 @default.
- W2111580947 hasPublicationYear "2005" @default.
- W2111580947 type Work @default.
- W2111580947 sameAs 2111580947 @default.
- W2111580947 citedByCount "7" @default.
- W2111580947 crossrefType "proceedings-article" @default.
- W2111580947 hasAuthorship W2111580947A5000699907 @default.
- W2111580947 hasAuthorship W2111580947A5040969707 @default.
- W2111580947 hasAuthorship W2111580947A5089791620 @default.
- W2111580947 hasConcept C111919701 @default.
- W2111580947 hasConcept C136764020 @default.
- W2111580947 hasConcept C138885662 @default.
- W2111580947 hasConcept C14838553 @default.
- W2111580947 hasConcept C162324750 @default.
- W2111580947 hasConcept C164120249 @default.
- W2111580947 hasConcept C187736073 @default.
- W2111580947 hasConcept C199683683 @default.
- W2111580947 hasConcept C23123220 @default.
- W2111580947 hasConcept C2777382242 @default.
- W2111580947 hasConcept C2780451532 @default.
- W2111580947 hasConcept C37202355 @default.
- W2111580947 hasConcept C41008148 @default.
- W2111580947 hasConcept C41895202 @default.
- W2111580947 hasConcept C50954386 @default.
- W2111580947 hasConcept C75165309 @default.
- W2111580947 hasConcept C77088390 @default.
- W2111580947 hasConcept C90805587 @default.
- W2111580947 hasConcept C97854310 @default.
- W2111580947 hasConceptScore W2111580947C111919701 @default.
- W2111580947 hasConceptScore W2111580947C136764020 @default.
- W2111580947 hasConceptScore W2111580947C138885662 @default.
- W2111580947 hasConceptScore W2111580947C14838553 @default.
- W2111580947 hasConceptScore W2111580947C162324750 @default.
- W2111580947 hasConceptScore W2111580947C164120249 @default.
- W2111580947 hasConceptScore W2111580947C187736073 @default.
- W2111580947 hasConceptScore W2111580947C199683683 @default.
- W2111580947 hasConceptScore W2111580947C23123220 @default.
- W2111580947 hasConceptScore W2111580947C2777382242 @default.
- W2111580947 hasConceptScore W2111580947C2780451532 @default.
- W2111580947 hasConceptScore W2111580947C37202355 @default.
- W2111580947 hasConceptScore W2111580947C41008148 @default.
- W2111580947 hasConceptScore W2111580947C41895202 @default.
- W2111580947 hasConceptScore W2111580947C50954386 @default.
- W2111580947 hasConceptScore W2111580947C75165309 @default.
- W2111580947 hasConceptScore W2111580947C77088390 @default.
- W2111580947 hasConceptScore W2111580947C90805587 @default.
- W2111580947 hasConceptScore W2111580947C97854310 @default.
- W2111580947 hasLocation W21115809471 @default.
- W2111580947 hasOpenAccess W2111580947 @default.
- W2111580947 hasPrimaryLocation W21115809471 @default.
- W2111580947 hasRelatedWork W124245717 @default.
- W2111580947 hasRelatedWork W151423689 @default.
- W2111580947 hasRelatedWork W1604541064 @default.
- W2111580947 hasRelatedWork W1918432453 @default.
- W2111580947 hasRelatedWork W1979459060 @default.
- W2111580947 hasRelatedWork W2034943479 @default.
- W2111580947 hasRelatedWork W2086253379 @default.
- W2111580947 hasRelatedWork W2177617093 @default.
- W2111580947 hasRelatedWork W2191209163 @default.
- W2111580947 hasRelatedWork W2311422303 @default.
- W2111580947 hasRelatedWork W2403288017 @default.
- W2111580947 hasRelatedWork W2406591195 @default.
- W2111580947 hasRelatedWork W2407544020 @default.
- W2111580947 hasRelatedWork W2465751108 @default.
- W2111580947 hasRelatedWork W2915677652 @default.
- W2111580947 hasRelatedWork W2996957750 @default.
- W2111580947 hasRelatedWork W2816492461 @default.
- W2111580947 hasRelatedWork W2847805717 @default.
- W2111580947 hasRelatedWork W2934068012 @default.
- W2111580947 hasRelatedWork W3108518371 @default.
- W2111580947 isParatext "false" @default.
- W2111580947 isRetracted "false" @default.
- W2111580947 magId "2111580947" @default.
- W2111580947 workType "article" @default.