Matches in SemOpenAlex for { <https://semopenalex.org/work/W2040708107> ?p ?o ?g. }
- W2040708107 endingPage "61" @default.
- W2040708107 startingPage "60" @default.
- W2040708107 abstract "Web search engines have to deal with a rapidly increasing amount of information, high query loads and tight performance constraints. The success of a search engine depends on the speed with which it answers queries (efficiency) and the quality of its answers (effectiveness). These two metrics have a large impact on the operational costs of the search engine and the overall user satisfaction, which determine the revenue of the search engine. In this context, any improvement in query processing efficiency can reduce the operational costs and improve user satisfaction, hence improve the overall benefit. In this thesis, we elaborate on query processing efficiency, address several problems within partitioned query processing, pruning and caching and propose several novel techniques: First, we look at term-wise partitioned indexes and address the main limitations of the state-of-the-art query processing methods. Our first approach combines the advantage of pipelined and traditional (non-pipelined) query processing. This approach assumes one disk access per posting list and traditional term-at-a-time processing. For the second approach, we follow an alternative direction and look at document-at-a-time processing of sub-queries and skipping. Subsequently, we present several skipping extensions to pipelined query processing, which as we show can improve the query processing performance and/or the quality of results. Then, we extend one of these methods with intra-query parallelism, which as we show can improve the performance at low query loads. Second, we look at skipping and pruning optimizations designed for a monolithic index. We present an efficient self-skipping inverted index designed for modern index compression methods and several query processing optimizations. We show that these optimizations can provide a significant speed-up compared to a full (non-pruned) evaluation and reduce the performance gap between disjunctive (OR) and conjunctive (AND) queries. We also propose a linear programming optimization that can further improve the I/O, decompression and computation efficiency of Max-Score. Third, we elaborate on caching in Web search engines in two independent contributions. First, we present an analytical model that finds the optimal split in a static memory-based two-level cache. Second, we present several strategies for selecting, ordering and scheduling prefetch queries and demonstrate that these can improve the efficiency and effectiveness of Web search engines. We carefully evaluate our ideas either using a real implementation or by simulation using real-world text collections and query logs. Most of the proposed techniques are found to improve the state-of-the-art in the conducted empirical studies. However, the implications and applicability of these techniques in practice need further evaluation in real-life settings. This dissertation was completed at the Department of Computer and Information Science at the Norwegian University of Science and Technology (NTNU) under advise of Prof. Svein Erik Bratsberg, Dr. Øystein Torbjørnsen and Dr. Magnus Lie Hetland. Some of the work was done in collaboration with Yahoo! Research Barcelona and mentored by Prof. Ricardo Baeza-Yates and Dr. B. Barla Cambazoglu. Prof. Alistair Moffat (University of Melbourne), Dr. Christina Lioma (University of Copenhagen) and Prof. Kjell Bratsbergsengen (NTNU) served as dissertation committee member. Available online at: http://www.idi.ntnu.no/research/doctor_theses/simonj.pdf." @default.
- W2040708107 created "2016-06-24" @default.
- W2040708107 creator A5049157067 @default.
- W2040708107 date "2012-06-07" @default.
- W2040708107 modified "2023-10-16" @default.
- W2040708107 title "Efficient query processing in distributed search engines" @default.
- W2040708107 cites W142762403 @default.
- W2040708107 cites W1482214997 @default.
- W2040708107 cites W1483313504 @default.
- W2040708107 cites W1490473477 @default.
- W2040708107 cites W1492896593 @default.
- W2040708107 cites W1495124840 @default.
- W2040708107 cites W1496881388 @default.
- W2040708107 cites W1524501441 @default.
- W2040708107 cites W1539242655 @default.
- W2040708107 cites W1550088701 @default.
- W2040708107 cites W1552628010 @default.
- W2040708107 cites W1556741196 @default.
- W2040708107 cites W1557658326 @default.
- W2040708107 cites W1561988317 @default.
- W2040708107 cites W1569487506 @default.
- W2040708107 cites W1569709403 @default.
- W2040708107 cites W1580375125 @default.
- W2040708107 cites W1580892610 @default.
- W2040708107 cites W1580930026 @default.
- W2040708107 cites W1602105900 @default.
- W2040708107 cites W1606874436 @default.
- W2040708107 cites W1608412409 @default.
- W2040708107 cites W1845198550 @default.
- W2040708107 cites W1852700332 @default.
- W2040708107 cites W1885526678 @default.
- W2040708107 cites W1929352279 @default.
- W2040708107 cites W1963485567 @default.
- W2040708107 cites W1965172494 @default.
- W2040708107 cites W1973355801 @default.
- W2040708107 cites W1973520416 @default.
- W2040708107 cites W1975709346 @default.
- W2040708107 cites W1977841655 @default.
- W2040708107 cites W1978063867 @default.
- W2040708107 cites W1978690967 @default.
- W2040708107 cites W1980344365 @default.
- W2040708107 cites W1982858363 @default.
- W2040708107 cites W1984614894 @default.
- W2040708107 cites W1990129631 @default.
- W2040708107 cites W1991360400 @default.
- W2040708107 cites W1997214779 @default.
- W2040708107 cites W1999747591 @default.
- W2040708107 cites W2001663465 @default.
- W2040708107 cites W2006307108 @default.
- W2040708107 cites W2006997130 @default.
- W2040708107 cites W2007807439 @default.
- W2040708107 cites W2009202693 @default.
- W2040708107 cites W2012275409 @default.
- W2040708107 cites W2018230281 @default.
- W2040708107 cites W2021733154 @default.
- W2040708107 cites W2022292926 @default.
- W2040708107 cites W2025690557 @default.
- W2040708107 cites W2028083097 @default.
- W2040708107 cites W2039678943 @default.
- W2040708107 cites W2043150166 @default.
- W2040708107 cites W2046862025 @default.
- W2040708107 cites W2059056873 @default.
- W2040708107 cites W2060204338 @default.
- W2040708107 cites W2063435439 @default.
- W2040708107 cites W2064522604 @default.
- W2040708107 cites W2065472179 @default.
- W2040708107 cites W2066500746 @default.
- W2040708107 cites W2066537690 @default.
- W2040708107 cites W2066667100 @default.
- W2040708107 cites W2070493638 @default.
- W2040708107 cites W2072156548 @default.
- W2040708107 cites W2073965851 @default.
- W2040708107 cites W2075279061 @default.
- W2040708107 cites W2076214367 @default.
- W2040708107 cites W2076471773 @default.
- W2040708107 cites W2079656678 @default.
- W2040708107 cites W2082973176 @default.
- W2040708107 cites W2086253379 @default.
- W2040708107 cites W2086453025 @default.
- W2040708107 cites W2093698835 @default.
- W2040708107 cites W2096227226 @default.
- W2040708107 cites W2096749370 @default.
- W2040708107 cites W2099111758 @default.
- W2040708107 cites W2099768249 @default.
- W2040708107 cites W2100474856 @default.
- W2040708107 cites W2104588805 @default.
- W2040708107 cites W2106591686 @default.
- W2040708107 cites W2108278040 @default.
- W2040708107 cites W2110679325 @default.
- W2040708107 cites W2116504754 @default.
- W2040708107 cites W2121928206 @default.
- W2040708107 cites W2123006679 @default.
- W2040708107 cites W2125347099 @default.
- W2040708107 cites W2130417465 @default.
- W2040708107 cites W2131149563 @default.
- W2040708107 cites W2132554106 @default.
- W2040708107 cites W2132858787 @default.
- W2040708107 cites W2133628502 @default.