Matches in SemOpenAlex for { <https://semopenalex.org/work/W657439651> ?p ?o ?g. }
- W657439651 abstract "Searching the web is critical to the Web's success. However, the frequency of searches together with the size of the index prohibit a single computer being able to cope with the computational load. Consequently, a variety of distributed architectures have been proposed. Commercial search engines such as Google, usually use an architecture where the the index is distributed but centrally managed over a number of disjoint partitions. This centralized architecture has a high capital and operating cost that presents a significant barrier preventing any new competitor from entering the search market. The dominance of a few Web search giants brings concerns about the objectivity of search results and the privacy of the user. A promising solution to eliminate the high cost of entry is to conduct the search on a peer-to-peer (P2P) architecture. Peer-to-peer architectures offer a more geographically dispersed arrangement of machines that are not centrally managed. This has the benefit of not requiring an expensive centralized server facility. However, the lack of a centralized management can complicate the communication process. And the storage and computational capabilities of peers may be much less than for nodes in a commercial search engine. P2P architectures are commonly categorized into two broad classes, structured and unstructured. Structured architectures guarantee that the entire index is searched for a query, but suffer high communication cost during retrieval and maintenance. In comparison, unstructured architectures do not guarantee the entire index is searched, but require less maintenance cost and are more robust to attacks. In this thesis we study the quality of the probabilistic search in an unstructured distributed network since such a network has potential for developing a low cost and robust large scale information retrieval system. Search in an unstructured distributed network is a challenge, since a single machine normally can only store a subset of documents, and a query is only sent to a subset of machines, due to limitations on computational and communication resources. Thus, IR systems built on such network do not guarantee that a query finds the required documents in the collection, and the search has to be probabilistic and non-deterministic. The search quality is measured by a new metric called accuracy, defined as the fraction of documents retrieved by a constrained, probabilistic search compared with those that would have been retrieved by an exhaustive search. We propose a mathematical framework for modeling search in an unstructured distributed network, and present a non-deterministic distributed search architecture called Probably Approximately Correct (PAC) search, We provide formulas to estimate the search quality based on different system parameters, and show that PAC can achieve good performance when using the same amount of resources of a centrally managed deterministic distributed information retrieval system. We also study the effects of node selection in a centralized PAC architecture. We theoretically and empirically analyze the search performance across query iterations, and show that the search accuracy can be improved by caching good performing nodes in a centralized PAC architecture. Experiments on a real document collection and query log support our analysis. We then investigate the effects of different document replication policies in a PAC IR system. We show that the traditional square-root replication policy is not optimum for maximizing accuracy, and give an optimality criterion for accuracy. A non-uniform distribution of documents improves the retrieval performance of popular documents at the expense of less popular documents. To compensate for this, we propose a hybrid replication policy consisting of a combination of uniform and non-uniform distributions. Theoretical and experimental results show that such an arrangement significantly improves the accuracy of less popular documents at the expense of only a small degradation in accuracy averaged over all queries. We finally explore the effects of query caching in the PAC architecture. We empirically analyze the search performance of queries being issued from a query log, and show that the search accuracy can be improved by caching the top-$k$ documents on each node. Simulations on a real document collection and query log support our analysis." @default.
- W657439651 created "2016-06-24" @default.
- W657439651 creator A5003649408 @default.
- W657439651 date "2012-10-28" @default.
- W657439651 modified "2023-09-26" @default.
- W657439651 title "The Quality of Probabilistic Search in Unstructured Distributed Information Retrieval Systems" @default.
- W657439651 cites W142212369 @default.
- W657439651 cites W1482214997 @default.
- W657439651 cites W1483924388 @default.
- W657439651 cites W1558573344 @default.
- W657439651 cites W1603805323 @default.
- W657439651 cites W1650675509 @default.
- W657439651 cites W1701068577 @default.
- W657439651 cites W1963547452 @default.
- W657439651 cites W1965690069 @default.
- W657439651 cites W1967879792 @default.
- W657439651 cites W2004441230 @default.
- W657439651 cites W2017005873 @default.
- W657439651 cites W204424277 @default.
- W657439651 cites W2045447437 @default.
- W657439651 cites W2048045485 @default.
- W657439651 cites W2056363353 @default.
- W657439651 cites W2073965851 @default.
- W657439651 cites W2093390569 @default.
- W657439651 cites W2093872623 @default.
- W657439651 cites W2098861879 @default.
- W657439651 cites W2103363198 @default.
- W657439651 cites W2105599110 @default.
- W657439651 cites W2110679325 @default.
- W657439651 cites W2111346848 @default.
- W657439651 cites W2118572087 @default.
- W657439651 cites W2138830906 @default.
- W657439651 cites W2142863519 @default.
- W657439651 cites W2146912629 @default.
- W657439651 cites W2149721632 @default.
- W657439651 cites W2152153601 @default.
- W657439651 cites W2158195707 @default.
- W657439651 cites W2162266953 @default.
- W657439651 cites W2163498454 @default.
- W657439651 cites W2165612380 @default.
- W657439651 cites W2174335063 @default.
- W657439651 cites W2174507869 @default.
- W657439651 cites W232533489 @default.
- W657439651 cites W239964209 @default.
- W657439651 cites W2482128004 @default.
- W657439651 cites W2615126720 @default.
- W657439651 hasPublicationYear "2012" @default.
- W657439651 type Work @default.
- W657439651 sameAs 657439651 @default.
- W657439651 citedByCount "0" @default.
- W657439651 crossrefType "dissertation" @default.
- W657439651 hasAuthorship W657439651A5003649408 @default.
- W657439651 hasConcept C120314980 @default.
- W657439651 hasConcept C123657996 @default.
- W657439651 hasConcept C136764020 @default.
- W657439651 hasConcept C142362112 @default.
- W657439651 hasConcept C153349607 @default.
- W657439651 hasConcept C154945302 @default.
- W657439651 hasConcept C164120249 @default.
- W657439651 hasConcept C173979980 @default.
- W657439651 hasConcept C41008148 @default.
- W657439651 hasConcept C49937458 @default.
- W657439651 hasConcept C75165309 @default.
- W657439651 hasConcept C97854310 @default.
- W657439651 hasConceptScore W657439651C120314980 @default.
- W657439651 hasConceptScore W657439651C123657996 @default.
- W657439651 hasConceptScore W657439651C136764020 @default.
- W657439651 hasConceptScore W657439651C142362112 @default.
- W657439651 hasConceptScore W657439651C153349607 @default.
- W657439651 hasConceptScore W657439651C154945302 @default.
- W657439651 hasConceptScore W657439651C164120249 @default.
- W657439651 hasConceptScore W657439651C173979980 @default.
- W657439651 hasConceptScore W657439651C41008148 @default.
- W657439651 hasConceptScore W657439651C49937458 @default.
- W657439651 hasConceptScore W657439651C75165309 @default.
- W657439651 hasConceptScore W657439651C97854310 @default.
- W657439651 hasLocation W6574396511 @default.
- W657439651 hasOpenAccess W657439651 @default.
- W657439651 hasPrimaryLocation W6574396511 @default.
- W657439651 hasRelatedWork W1483456319 @default.
- W657439651 hasRelatedWork W2011557772 @default.
- W657439651 hasRelatedWork W2039678943 @default.
- W657439651 hasRelatedWork W2054346354 @default.
- W657439651 hasRelatedWork W2063575700 @default.
- W657439651 hasRelatedWork W2105177147 @default.
- W657439651 hasRelatedWork W2111883680 @default.
- W657439651 hasRelatedWork W2123515190 @default.
- W657439651 hasRelatedWork W2124838580 @default.
- W657439651 hasRelatedWork W2137121051 @default.
- W657439651 hasRelatedWork W2142904663 @default.
- W657439651 hasRelatedWork W2149341296 @default.
- W657439651 hasRelatedWork W2167922649 @default.
- W657439651 hasRelatedWork W2171648605 @default.
- W657439651 hasRelatedWork W2182923201 @default.
- W657439651 hasRelatedWork W2308250921 @default.
- W657439651 hasRelatedWork W2335855784 @default.
- W657439651 hasRelatedWork W26889595 @default.
- W657439651 hasRelatedWork W2773667291 @default.
- W657439651 hasRelatedWork W31957839 @default.
- W657439651 isParatext "false" @default.