Matches in SemOpenAlex for { <https://semopenalex.org/work/W1226131540> ?p ?o ?g. }
Showing items 1 to 80 of
80
with 100 items per page.
- W1226131540 abstract "Template detection and content extraction are two of the main areas of information retrieval applied to the Web. They perform different analyses over the structure and content of webpages to extract some part of the document. However, their objective is different. While template detection identifies the template of a webpage (usually comparing with other webpages of the same website), content extraction identifies the main content of the webpage discarding the other part. Therefore, they are somehow complementary, because the main content is not part of the template. It has been measured that templates represent between 40% and 50% of data on the Web. Therefore, identifying templates is essential for indexing tasks because templates usually contain irrelevant information such as advertisements, menus and banners. Processing and storing this information is likely to lead to a waste of resources (storage space, bandwidth, etc.). Similarly, identifying the main content is essential for many information retrieval tasks. In this paper, we present a benchmark suite to test different approaches for template detection and content extraction. The suite is public, and it contains real heterogeneous webpages that have been labelled so that different techniques can be suitable (and automatically) compared." @default.
- W1226131540 created "2016-06-24" @default.
- W1226131540 creator A5003503212 @default.
- W1226131540 creator A5016645788 @default.
- W1226131540 creator A5060938121 @default.
- W1226131540 creator A5071216572 @default.
- W1226131540 date "2014-09-22" @default.
- W1226131540 modified "2023-09-27" @default.
- W1226131540 title "A Benchmark Suite for Template Detection and Content Extraction." @default.
- W1226131540 cites W15548 @default.
- W1226131540 cites W2048192672 @default.
- W1226131540 hasPublicationYear "2014" @default.
- W1226131540 type Work @default.
- W1226131540 sameAs 1226131540 @default.
- W1226131540 citedByCount "1" @default.
- W1226131540 countsByYear W12261315402016 @default.
- W1226131540 crossrefType "posted-content" @default.
- W1226131540 hasAuthorship W1226131540A5003503212 @default.
- W1226131540 hasAuthorship W1226131540A5016645788 @default.
- W1226131540 hasAuthorship W1226131540A5060938121 @default.
- W1226131540 hasAuthorship W1226131540A5071216572 @default.
- W1226131540 hasConcept C124101348 @default.
- W1226131540 hasConcept C13280743 @default.
- W1226131540 hasConcept C136764020 @default.
- W1226131540 hasConcept C166957645 @default.
- W1226131540 hasConcept C185798385 @default.
- W1226131540 hasConcept C195807954 @default.
- W1226131540 hasConcept C199360897 @default.
- W1226131540 hasConcept C205649164 @default.
- W1226131540 hasConcept C21959979 @default.
- W1226131540 hasConcept C23123220 @default.
- W1226131540 hasConcept C41008148 @default.
- W1226131540 hasConcept C75165309 @default.
- W1226131540 hasConcept C77088390 @default.
- W1226131540 hasConcept C79581498 @default.
- W1226131540 hasConcept C82714645 @default.
- W1226131540 hasConcept C95457728 @default.
- W1226131540 hasConceptScore W1226131540C124101348 @default.
- W1226131540 hasConceptScore W1226131540C13280743 @default.
- W1226131540 hasConceptScore W1226131540C136764020 @default.
- W1226131540 hasConceptScore W1226131540C166957645 @default.
- W1226131540 hasConceptScore W1226131540C185798385 @default.
- W1226131540 hasConceptScore W1226131540C195807954 @default.
- W1226131540 hasConceptScore W1226131540C199360897 @default.
- W1226131540 hasConceptScore W1226131540C205649164 @default.
- W1226131540 hasConceptScore W1226131540C21959979 @default.
- W1226131540 hasConceptScore W1226131540C23123220 @default.
- W1226131540 hasConceptScore W1226131540C41008148 @default.
- W1226131540 hasConceptScore W1226131540C75165309 @default.
- W1226131540 hasConceptScore W1226131540C77088390 @default.
- W1226131540 hasConceptScore W1226131540C79581498 @default.
- W1226131540 hasConceptScore W1226131540C82714645 @default.
- W1226131540 hasConceptScore W1226131540C95457728 @default.
- W1226131540 hasLocation W12261315401 @default.
- W1226131540 hasOpenAccess W1226131540 @default.
- W1226131540 hasPrimaryLocation W12261315401 @default.
- W1226131540 hasRelatedWork W1437142549 @default.
- W1226131540 hasRelatedWork W1544491176 @default.
- W1226131540 hasRelatedWork W1544851065 @default.
- W1226131540 hasRelatedWork W1942083547 @default.
- W1226131540 hasRelatedWork W2013755777 @default.
- W1226131540 hasRelatedWork W2042970189 @default.
- W1226131540 hasRelatedWork W2072489225 @default.
- W1226131540 hasRelatedWork W2157027287 @default.
- W1226131540 hasRelatedWork W2187397465 @default.
- W1226131540 hasRelatedWork W2201534957 @default.
- W1226131540 hasRelatedWork W2275993472 @default.
- W1226131540 hasRelatedWork W2309642097 @default.
- W1226131540 hasRelatedWork W2373402338 @default.
- W1226131540 hasRelatedWork W2472420721 @default.
- W1226131540 hasRelatedWork W2476847310 @default.
- W1226131540 hasRelatedWork W2897171874 @default.
- W1226131540 hasRelatedWork W3144508074 @default.
- W1226131540 hasRelatedWork W36911888 @default.
- W1226131540 hasRelatedWork W761550832 @default.
- W1226131540 hasRelatedWork W2976223991 @default.
- W1226131540 isParatext "false" @default.
- W1226131540 isRetracted "false" @default.
- W1226131540 magId "1226131540" @default.
- W1226131540 workType "article" @default.