Matches in SemOpenAlex for { <https://semopenalex.org/work/W2156893245> ?p ?o ?g. }
- W2156893245 endingPage "340" @default.
- W2156893245 startingPage "331" @default.
- W2156893245 abstract "Web archives preserve the fast changing Web, yet are highly incomplete due to crawling restrictions, crawling depth and frequency, or restrictive selection policies---most of the Web is unarchived and therefore lost to posterity. In this paper, we propose an approach to recover significant parts of the unarchived Web, by reconstructing descriptions of these pages based on links and anchors in the set of crawled pages, and experiment with this approach on the Dutch Web archive.Our main findings are threefold. First, the crawled Web contains evidence of a remarkable number of unarchived pages and websites, potentially dramatically increasing the coverage of the Web archive. Second, the link and anchor descriptions have a highly skewed distribution: popular pages such as home pages have more terms, but the richness tapers off quickly. Third, the succinct representation is generally rich enough to uniquely identify pages on the unarchived Web: in a known-item search setting we can retrieve these pages within the first ranks on average." @default.
- W2156893245 created "2016-06-24" @default.
- W2156893245 creator A5044511901 @default.
- W2156893245 creator A5046069889 @default.
- W2156893245 creator A5054097060 @default.
- W2156893245 creator A5058124570 @default.
- W2156893245 creator A5060259737 @default.
- W2156893245 date "2014-09-08" @default.
- W2156893245 modified "2023-09-23" @default.
- W2156893245 title "Finding pages on the unarchived web" @default.
- W2156893245 cites W1489893579 @default.
- W2156893245 cites W1564027019 @default.
- W2156893245 cites W1854214752 @default.
- W2156893245 cites W1971772794 @default.
- W2156893245 cites W1989468977 @default.
- W2156893245 cites W2007571138 @default.
- W2156893245 cites W2009140921 @default.
- W2156893245 cites W2019194162 @default.
- W2156893245 cites W2043275333 @default.
- W2156893245 cites W2049718889 @default.
- W2156893245 cites W2060370185 @default.
- W2156893245 cites W2076086413 @default.
- W2156893245 cites W2083745421 @default.
- W2156893245 cites W2138621811 @default.
- W2156893245 cites W2147057843 @default.
- W2156893245 cites W2147872511 @default.
- W2156893245 cites W2164052363 @default.
- W2156893245 cites W2171710828 @default.
- W2156893245 cites W2293827470 @default.
- W2156893245 cites W2913520302 @default.
- W2156893245 cites W2969798785 @default.
- W2156893245 cites W3204318296 @default.
- W2156893245 cites W751588 @default.
- W2156893245 doi "https://doi.org/10.5555/2740769.2740827" @default.
- W2156893245 hasPublicationYear "2014" @default.
- W2156893245 type Work @default.
- W2156893245 sameAs 2156893245 @default.
- W2156893245 citedByCount "5" @default.
- W2156893245 countsByYear W21568932452015 @default.
- W2156893245 countsByYear W21568932452016 @default.
- W2156893245 countsByYear W21568932452017 @default.
- W2156893245 countsByYear W21568932452019 @default.
- W2156893245 crossrefType "proceedings-article" @default.
- W2156893245 hasAuthorship W2156893245A5044511901 @default.
- W2156893245 hasAuthorship W2156893245A5046069889 @default.
- W2156893245 hasAuthorship W2156893245A5054097060 @default.
- W2156893245 hasAuthorship W2156893245A5058124570 @default.
- W2156893245 hasAuthorship W2156893245A5060259737 @default.
- W2156893245 hasConcept C100368936 @default.
- W2156893245 hasConcept C105702510 @default.
- W2156893245 hasConcept C136764020 @default.
- W2156893245 hasConcept C13743948 @default.
- W2156893245 hasConcept C154945302 @default.
- W2156893245 hasConcept C173576120 @default.
- W2156893245 hasConcept C177264268 @default.
- W2156893245 hasConcept C195409031 @default.
- W2156893245 hasConcept C199360897 @default.
- W2156893245 hasConcept C21959979 @default.
- W2156893245 hasConcept C23123220 @default.
- W2156893245 hasConcept C41008148 @default.
- W2156893245 hasConcept C521815418 @default.
- W2156893245 hasConcept C71924100 @default.
- W2156893245 hasConcept C79373723 @default.
- W2156893245 hasConcept C81917197 @default.
- W2156893245 hasConceptScore W2156893245C100368936 @default.
- W2156893245 hasConceptScore W2156893245C105702510 @default.
- W2156893245 hasConceptScore W2156893245C136764020 @default.
- W2156893245 hasConceptScore W2156893245C13743948 @default.
- W2156893245 hasConceptScore W2156893245C154945302 @default.
- W2156893245 hasConceptScore W2156893245C173576120 @default.
- W2156893245 hasConceptScore W2156893245C177264268 @default.
- W2156893245 hasConceptScore W2156893245C195409031 @default.
- W2156893245 hasConceptScore W2156893245C199360897 @default.
- W2156893245 hasConceptScore W2156893245C21959979 @default.
- W2156893245 hasConceptScore W2156893245C23123220 @default.
- W2156893245 hasConceptScore W2156893245C41008148 @default.
- W2156893245 hasConceptScore W2156893245C521815418 @default.
- W2156893245 hasConceptScore W2156893245C71924100 @default.
- W2156893245 hasConceptScore W2156893245C79373723 @default.
- W2156893245 hasConceptScore W2156893245C81917197 @default.
- W2156893245 hasLocation W21568932451 @default.
- W2156893245 hasOpenAccess W2156893245 @default.
- W2156893245 hasPrimaryLocation W21568932451 @default.
- W2156893245 hasRelatedWork W1490202509 @default.
- W2156893245 hasRelatedWork W155214347 @default.
- W2156893245 hasRelatedWork W175372466 @default.
- W2156893245 hasRelatedWork W1905830520 @default.
- W2156893245 hasRelatedWork W1971956037 @default.
- W2156893245 hasRelatedWork W1976296352 @default.
- W2156893245 hasRelatedWork W1981713872 @default.
- W2156893245 hasRelatedWork W2005577266 @default.
- W2156893245 hasRelatedWork W2060370185 @default.
- W2156893245 hasRelatedWork W2098120493 @default.
- W2156893245 hasRelatedWork W2146847787 @default.
- W2156893245 hasRelatedWork W2156928086 @default.
- W2156893245 hasRelatedWork W2160773718 @default.
- W2156893245 hasRelatedWork W2165406402 @default.
- W2156893245 hasRelatedWork W2169189540 @default.