Matches in SemOpenAlex for { <https://semopenalex.org/work/W2343474118> ?p ?o ?g. }
- W2343474118 endingPage "23" @default.
- W2343474118 startingPage "1" @default.
- W2343474118 abstract "Indexing highly repetitive collections has become a relevant problem with the emergence of large repositories of versioned documents, among other applications. These collections may reach huge sizes, but are formed mostly of documents that are near-copies of others. Traditional techniques for indexing these collections fail to properly exploit their regularities in order to reduce space. We introduce new techniques for compressing inverted indexes that exploit this near-copy regularity. They are based on run-length, Lempel-Ziv, or grammar compression of the differential inverted lists, instead of the usual practice of gap-encoding them. We show that, in this highly repetitive setting, our compression methods significantly reduce the space obtained with classical techniques, at the price of moderate slowdowns. Moreover, our best methods are universal, that is, they do not need to know the versioning structure of the collection, nor that a clear versioning structure even exists. We also introduce compressed self-indexes in the comparison. These are designed for general strings (not only natural language texts) and represent the text collection plus the index structure (not an inverted index) in integrated form. We show that these techniques can compress much further, using a small fraction of the space required by our new inverted indexes. Yet, they are orders of magnitude slower." @default.
- W2343474118 created "2016-06-24" @default.
- W2343474118 creator A5024008175 @default.
- W2343474118 creator A5065554261 @default.
- W2343474118 creator A5080743153 @default.
- W2343474118 creator A5091203911 @default.
- W2343474118 date "2016-10-01" @default.
- W2343474118 modified "2023-09-27" @default.
- W2343474118 title "Universal indexes for highly repetitive document collections" @default.
- W2343474118 cites W110689283 @default.
- W2343474118 cites W15354235 @default.
- W2343474118 cites W1553071595 @default.
- W2343474118 cites W1556741196 @default.
- W2343474118 cites W1556744446 @default.
- W2343474118 cites W1559631118 @default.
- W2343474118 cites W1567561846 @default.
- W2343474118 cites W1798412263 @default.
- W2343474118 cites W179872536 @default.
- W2343474118 cites W1878541814 @default.
- W2343474118 cites W1969838114 @default.
- W2343474118 cites W1980344365 @default.
- W2343474118 cites W1984614894 @default.
- W2343474118 cites W1985136582 @default.
- W2343474118 cites W1988679864 @default.
- W2343474118 cites W1989749956 @default.
- W2343474118 cites W1990244497 @default.
- W2343474118 cites W1996930216 @default.
- W2343474118 cites W2013849299 @default.
- W2343474118 cites W2019406253 @default.
- W2343474118 cites W2022292926 @default.
- W2343474118 cites W2022507549 @default.
- W2343474118 cites W2025690557 @default.
- W2343474118 cites W2031780529 @default.
- W2343474118 cites W2033005962 @default.
- W2343474118 cites W2041824945 @default.
- W2343474118 cites W2046038806 @default.
- W2343474118 cites W2052867877 @default.
- W2343474118 cites W2061986359 @default.
- W2343474118 cites W2063694594 @default.
- W2343474118 cites W2072021854 @default.
- W2343474118 cites W2076471773 @default.
- W2343474118 cites W2087361130 @default.
- W2343474118 cites W2088386938 @default.
- W2343474118 cites W2089455813 @default.
- W2343474118 cites W2094154930 @default.
- W2343474118 cites W2097589086 @default.
- W2343474118 cites W2101881908 @default.
- W2343474118 cites W2107745473 @default.
- W2343474118 cites W2108079923 @default.
- W2343474118 cites W2108974085 @default.
- W2343474118 cites W2113004376 @default.
- W2343474118 cites W2132809979 @default.
- W2343474118 cites W2136070674 @default.
- W2343474118 cites W2138662031 @default.
- W2343474118 cites W2140453381 @default.
- W2343474118 cites W2141931308 @default.
- W2343474118 cites W2142905080 @default.
- W2343474118 cites W2148113067 @default.
- W2343474118 cites W2152437528 @default.
- W2343474118 cites W2155512447 @default.
- W2343474118 cites W2157714561 @default.
- W2343474118 cites W2170907470 @default.
- W2343474118 cites W226134553 @default.
- W2343474118 cites W2602771387 @default.
- W2343474118 cites W4254779780 @default.
- W2343474118 cites W2090283421 @default.
- W2343474118 doi "https://doi.org/10.1016/j.is.2016.04.002" @default.
- W2343474118 hasPublicationYear "2016" @default.
- W2343474118 type Work @default.
- W2343474118 sameAs 2343474118 @default.
- W2343474118 citedByCount "23" @default.
- W2343474118 countsByYear W23434741182012 @default.
- W2343474118 countsByYear W23434741182017 @default.
- W2343474118 countsByYear W23434741182018 @default.
- W2343474118 countsByYear W23434741182019 @default.
- W2343474118 countsByYear W23434741182020 @default.
- W2343474118 countsByYear W23434741182021 @default.
- W2343474118 countsByYear W23434741182022 @default.
- W2343474118 crossrefType "journal-article" @default.
- W2343474118 hasAuthorship W2343474118A5024008175 @default.
- W2343474118 hasAuthorship W2343474118A5065554261 @default.
- W2343474118 hasAuthorship W2343474118A5080743153 @default.
- W2343474118 hasAuthorship W2343474118A5091203911 @default.
- W2343474118 hasBestOaLocation W23434741182 @default.
- W2343474118 hasConcept C111919701 @default.
- W2343474118 hasConcept C124101348 @default.
- W2343474118 hasConcept C130590232 @default.
- W2343474118 hasConcept C162319229 @default.
- W2343474118 hasConcept C165696696 @default.
- W2343474118 hasConcept C199360897 @default.
- W2343474118 hasConcept C23123220 @default.
- W2343474118 hasConcept C2777382242 @default.
- W2343474118 hasConcept C2778572836 @default.
- W2343474118 hasConcept C38652104 @default.
- W2343474118 hasConcept C41008148 @default.
- W2343474118 hasConcept C75165309 @default.
- W2343474118 hasConcept C80444323 @default.
- W2343474118 hasConceptScore W2343474118C111919701 @default.