Matches in SemOpenAlex for { <https://semopenalex.org/work/W2140077965> ?p ?o ?g. }
Showing items 1 to 97 of
97
with 100 items per page.
- W2140077965 abstract "Many Web IR and Digital Library applications require a crawling process to collect pages with the ultimate goal of taking advantage of useful information available on Web sites. For some of these applications the criteria to determine when a page is to be present in a collection are related to the page content. However, there are situations in which the inner structure of the pages provides a better criteria to guide the crawling process than their content. In this paper, we present a structure-driven approach for generating Web crawlers that requires a minimum effort from users. The idea is to take as input a sample page and an entry point to a Web site and generate a structure-driven crawler based on navigation patterns, sequences of patterns for the links a crawler has to follow to reach the pages structurally similar to the sample page. In the experiments we have carried out, structure-driven crawlers generated by our new approach were able to collect all pages that match the samples given, including those pages added after their generation." @default.
- W2140077965 created "2016-06-24" @default.
- W2140077965 creator A5013954836 @default.
- W2140077965 creator A5015763962 @default.
- W2140077965 creator A5015781017 @default.
- W2140077965 creator A5078104555 @default.
- W2140077965 date "2006-08-06" @default.
- W2140077965 modified "2023-10-17" @default.
- W2140077965 title "Structure-driven crawler generation by example" @default.
- W2140077965 cites W1489992655 @default.
- W2140077965 cites W1599730494 @default.
- W2140077965 cites W1973625941 @default.
- W2140077965 cites W2007687650 @default.
- W2140077965 cites W2017726337 @default.
- W2140077965 cites W2022120051 @default.
- W2140077965 cites W2048001624 @default.
- W2140077965 cites W2049365470 @default.
- W2140077965 cites W2049461910 @default.
- W2140077965 cites W2057308086 @default.
- W2140077965 cites W2066309116 @default.
- W2140077965 cites W2110780226 @default.
- W2140077965 cites W2128915886 @default.
- W2140077965 cites W2129595335 @default.
- W2140077965 cites W2143309843 @default.
- W2140077965 cites W2150721933 @default.
- W2140077965 cites W4243938628 @default.
- W2140077965 doi "https://doi.org/10.1145/1148170.1148223" @default.
- W2140077965 hasPublicationYear "2006" @default.
- W2140077965 type Work @default.
- W2140077965 sameAs 2140077965 @default.
- W2140077965 citedByCount "43" @default.
- W2140077965 countsByYear W21400779652012 @default.
- W2140077965 countsByYear W21400779652013 @default.
- W2140077965 countsByYear W21400779652014 @default.
- W2140077965 countsByYear W21400779652015 @default.
- W2140077965 countsByYear W21400779652017 @default.
- W2140077965 countsByYear W21400779652019 @default.
- W2140077965 countsByYear W21400779652021 @default.
- W2140077965 countsByYear W21400779652022 @default.
- W2140077965 crossrefType "proceedings-article" @default.
- W2140077965 hasAuthorship W2140077965A5013954836 @default.
- W2140077965 hasAuthorship W2140077965A5015763962 @default.
- W2140077965 hasAuthorship W2140077965A5015781017 @default.
- W2140077965 hasAuthorship W2140077965A5078104555 @default.
- W2140077965 hasConcept C100368936 @default.
- W2140077965 hasConcept C105702510 @default.
- W2140077965 hasConcept C111919701 @default.
- W2140077965 hasConcept C136764020 @default.
- W2140077965 hasConcept C13743948 @default.
- W2140077965 hasConcept C173576120 @default.
- W2140077965 hasConcept C21959979 @default.
- W2140077965 hasConcept C23123220 @default.
- W2140077965 hasConcept C2524010 @default.
- W2140077965 hasConcept C28719098 @default.
- W2140077965 hasConcept C33923547 @default.
- W2140077965 hasConcept C41008148 @default.
- W2140077965 hasConcept C521815418 @default.
- W2140077965 hasConcept C61096286 @default.
- W2140077965 hasConcept C67617509 @default.
- W2140077965 hasConcept C71924100 @default.
- W2140077965 hasConcept C73340581 @default.
- W2140077965 hasConcept C98045186 @default.
- W2140077965 hasConceptScore W2140077965C100368936 @default.
- W2140077965 hasConceptScore W2140077965C105702510 @default.
- W2140077965 hasConceptScore W2140077965C111919701 @default.
- W2140077965 hasConceptScore W2140077965C136764020 @default.
- W2140077965 hasConceptScore W2140077965C13743948 @default.
- W2140077965 hasConceptScore W2140077965C173576120 @default.
- W2140077965 hasConceptScore W2140077965C21959979 @default.
- W2140077965 hasConceptScore W2140077965C23123220 @default.
- W2140077965 hasConceptScore W2140077965C2524010 @default.
- W2140077965 hasConceptScore W2140077965C28719098 @default.
- W2140077965 hasConceptScore W2140077965C33923547 @default.
- W2140077965 hasConceptScore W2140077965C41008148 @default.
- W2140077965 hasConceptScore W2140077965C521815418 @default.
- W2140077965 hasConceptScore W2140077965C61096286 @default.
- W2140077965 hasConceptScore W2140077965C67617509 @default.
- W2140077965 hasConceptScore W2140077965C71924100 @default.
- W2140077965 hasConceptScore W2140077965C73340581 @default.
- W2140077965 hasConceptScore W2140077965C98045186 @default.
- W2140077965 hasLocation W21400779651 @default.
- W2140077965 hasOpenAccess W2140077965 @default.
- W2140077965 hasPrimaryLocation W21400779651 @default.
- W2140077965 hasRelatedWork W1506122440 @default.
- W2140077965 hasRelatedWork W1971956037 @default.
- W2140077965 hasRelatedWork W2026132847 @default.
- W2140077965 hasRelatedWork W2051135816 @default.
- W2140077965 hasRelatedWork W2078731629 @default.
- W2140077965 hasRelatedWork W2186692612 @default.
- W2140077965 hasRelatedWork W2772576376 @default.
- W2140077965 hasRelatedWork W2941499861 @default.
- W2140077965 hasRelatedWork W2997495867 @default.
- W2140077965 hasRelatedWork W3184561988 @default.
- W2140077965 isParatext "false" @default.
- W2140077965 isRetracted "false" @default.
- W2140077965 magId "2140077965" @default.
- W2140077965 workType "article" @default.