Matches in SemOpenAlex for { <https://semopenalex.org/work/W2128941908> ?p ?o ?g. }
Showing items 1 to 96 of
96
with 100 items per page.
- W2128941908 abstract "We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from a search engine's index using only the search engine's public interface? Such a primitive is particularly useful in creating objective benchmarks for search engines.The technique of Bharat and Broder suffers from two well recorded biases: it favors long documents and highly ranked documents. In this paper we introduce two novel sampling techniques: a lexicon-based technique and a random walk technique. Our methods produce biased sample documents, but each sample is accompanied by a corresponding weight, which represents the probability of this document to be selected in the sample. The samples, in conjunction with the weights, are then used to simulate near-uniform samples. To this end, we resort to three well known Monte Carlo simulation methods: rejection sampling, importance sampling and the Metropolis-Hastings algorithm.We analyze our methods rigorously and prove that under plausible assumptions, our techniques are guaranteed to produce near-uniform samples from the search engine's index. Experiments on a corpus of 2.4 million documents substantiate our analytical findings and show that our algorithms do not have significant bias towards long or highly ranked documents. We use our algorithms to collect fresh data about the relative sizes of Google, MSN Search, and Yahoo!." @default.
- W2128941908 created "2016-06-24" @default.
- W2128941908 creator A5054834255 @default.
- W2128941908 creator A5086993894 @default.
- W2128941908 date "2006-05-23" @default.
- W2128941908 modified "2023-09-28" @default.
- W2128941908 title "Random sampling from a search engine's index" @default.
- W2128941908 cites W1598759141 @default.
- W2128941908 cites W1659541576 @default.
- W2128941908 cites W1964038241 @default.
- W2128941908 cites W2019473674 @default.
- W2128941908 cites W2028716813 @default.
- W2128941908 cites W2056760934 @default.
- W2128941908 cites W2080676333 @default.
- W2128941908 cites W2117850397 @default.
- W2128941908 cites W2125125501 @default.
- W2128941908 cites W2138309709 @default.
- W2128941908 cites W2144959234 @default.
- W2128941908 cites W2247055361 @default.
- W2128941908 doi "https://doi.org/10.1145/1135777.1135833" @default.
- W2128941908 hasPublicationYear "2006" @default.
- W2128941908 type Work @default.
- W2128941908 sameAs 2128941908 @default.
- W2128941908 citedByCount "115" @default.
- W2128941908 countsByYear W21289419082012 @default.
- W2128941908 countsByYear W21289419082013 @default.
- W2128941908 countsByYear W21289419082014 @default.
- W2128941908 countsByYear W21289419082015 @default.
- W2128941908 countsByYear W21289419082016 @default.
- W2128941908 countsByYear W21289419082020 @default.
- W2128941908 crossrefType "proceedings-article" @default.
- W2128941908 hasAuthorship W2128941908A5054834255 @default.
- W2128941908 hasAuthorship W2128941908A5086993894 @default.
- W2128941908 hasBestOaLocation W21289419082 @default.
- W2128941908 hasConcept C105795698 @default.
- W2128941908 hasConcept C106131492 @default.
- W2128941908 hasConcept C113843644 @default.
- W2128941908 hasConcept C124101348 @default.
- W2128941908 hasConcept C129307140 @default.
- W2128941908 hasConcept C129848803 @default.
- W2128941908 hasConcept C136764020 @default.
- W2128941908 hasConcept C140779682 @default.
- W2128941908 hasConcept C154945302 @default.
- W2128941908 hasConcept C157915830 @default.
- W2128941908 hasConcept C173608175 @default.
- W2128941908 hasConcept C185592680 @default.
- W2128941908 hasConcept C19499675 @default.
- W2128941908 hasConcept C198531522 @default.
- W2128941908 hasConcept C23123220 @default.
- W2128941908 hasConcept C2777382242 @default.
- W2128941908 hasConcept C2778121359 @default.
- W2128941908 hasConcept C31972630 @default.
- W2128941908 hasConcept C33923547 @default.
- W2128941908 hasConcept C41008148 @default.
- W2128941908 hasConcept C43617362 @default.
- W2128941908 hasConcept C97854310 @default.
- W2128941908 hasConceptScore W2128941908C105795698 @default.
- W2128941908 hasConceptScore W2128941908C106131492 @default.
- W2128941908 hasConceptScore W2128941908C113843644 @default.
- W2128941908 hasConceptScore W2128941908C124101348 @default.
- W2128941908 hasConceptScore W2128941908C129307140 @default.
- W2128941908 hasConceptScore W2128941908C129848803 @default.
- W2128941908 hasConceptScore W2128941908C136764020 @default.
- W2128941908 hasConceptScore W2128941908C140779682 @default.
- W2128941908 hasConceptScore W2128941908C154945302 @default.
- W2128941908 hasConceptScore W2128941908C157915830 @default.
- W2128941908 hasConceptScore W2128941908C173608175 @default.
- W2128941908 hasConceptScore W2128941908C185592680 @default.
- W2128941908 hasConceptScore W2128941908C19499675 @default.
- W2128941908 hasConceptScore W2128941908C198531522 @default.
- W2128941908 hasConceptScore W2128941908C23123220 @default.
- W2128941908 hasConceptScore W2128941908C2777382242 @default.
- W2128941908 hasConceptScore W2128941908C2778121359 @default.
- W2128941908 hasConceptScore W2128941908C31972630 @default.
- W2128941908 hasConceptScore W2128941908C33923547 @default.
- W2128941908 hasConceptScore W2128941908C41008148 @default.
- W2128941908 hasConceptScore W2128941908C43617362 @default.
- W2128941908 hasConceptScore W2128941908C97854310 @default.
- W2128941908 hasLocation W21289419081 @default.
- W2128941908 hasLocation W21289419082 @default.
- W2128941908 hasOpenAccess W2128941908 @default.
- W2128941908 hasPrimaryLocation W21289419081 @default.
- W2128941908 hasRelatedWork W1601704076 @default.
- W2128941908 hasRelatedWork W1979246953 @default.
- W2128941908 hasRelatedWork W1993731342 @default.
- W2128941908 hasRelatedWork W1996590479 @default.
- W2128941908 hasRelatedWork W2119135658 @default.
- W2128941908 hasRelatedWork W2161902337 @default.
- W2128941908 hasRelatedWork W275516553 @default.
- W2128941908 hasRelatedWork W4382644495 @default.
- W2128941908 hasRelatedWork W4385326376 @default.
- W2128941908 hasRelatedWork W2184999834 @default.
- W2128941908 isParatext "false" @default.
- W2128941908 isRetracted "false" @default.
- W2128941908 magId "2128941908" @default.
- W2128941908 workType "article" @default.