Matches in SemOpenAlex for { <https://semopenalex.org/work/W1738382303> ?p ?o ?g. }
- W1738382303 endingPage "12" @default.
- W1738382303 startingPage "5" @default.
- W1738382303 abstract "This article introduces a new approach for content extraction that exploits the hierarchical inter-relations of the elements in a webpage. Content extraction is a technique used to extract from a webpage the main textual content. This is useful in order to filter out the advertisements and all the additional information that is not part of the main content. The main idea behind our approach is to use the DOM tree as an explicit representation of the inter-relations of the elements in a webpage. Using the information contained in the DOM tree we can identify blocks of content and we can easily determine what of the blocks contains more text. Thanks to this information, the technique achieves a considerable recall and precision. Using the DOM structure for content extraction gives us the benefits of other approaches based on the syntax of the webpage (such as characters, words and tags), but it also gives us a very precise information regarding the related components in a block, thus, producing very cohesive blocks." @default.
- W1738382303 created "2016-06-24" @default.
- W1738382303 creator A5002273350 @default.
- W1738382303 creator A5003503212 @default.
- W1738382303 creator A5071216572 @default.
- W1738382303 date "2012-06-30" @default.
- W1738382303 modified "2023-09-27" @default.
- W1738382303 title "Content Extraction based on Hierarchical Relations in DOM Structures" @default.
- W1738382303 cites W116870935 @default.
- W1738382303 cites W1553019137 @default.
- W1738382303 cites W2007442002 @default.
- W1738382303 cites W2012575882 @default.
- W1738382303 cites W2019577381 @default.
- W1738382303 cites W2024791376 @default.
- W1738382303 cites W2084358158 @default.
- W1738382303 cites W2093559286 @default.
- W1738382303 cites W2117209866 @default.
- W1738382303 cites W2117694587 @default.
- W1738382303 cites W2120101509 @default.
- W1738382303 cites W2125570474 @default.
- W1738382303 cites W2151588647 @default.
- W1738382303 cites W2154484245 @default.
- W1738382303 cites W22461475 @default.
- W1738382303 cites W2520948221 @default.
- W1738382303 cites W46681494 @default.
- W1738382303 doi "https://doi.org/10.17562/pb-45-1" @default.
- W1738382303 hasPublicationYear "2012" @default.
- W1738382303 type Work @default.
- W1738382303 sameAs 1738382303 @default.
- W1738382303 citedByCount "0" @default.
- W1738382303 crossrefType "journal-article" @default.
- W1738382303 hasAuthorship W1738382303A5002273350 @default.
- W1738382303 hasAuthorship W1738382303A5003503212 @default.
- W1738382303 hasAuthorship W1738382303A5071216572 @default.
- W1738382303 hasBestOaLocation W17383823032 @default.
- W1738382303 hasConcept C106131492 @default.
- W1738382303 hasConcept C113174947 @default.
- W1738382303 hasConcept C134306372 @default.
- W1738382303 hasConcept C136764020 @default.
- W1738382303 hasConcept C137922610 @default.
- W1738382303 hasConcept C154945302 @default.
- W1738382303 hasConcept C17744445 @default.
- W1738382303 hasConcept C195807954 @default.
- W1738382303 hasConcept C199539241 @default.
- W1738382303 hasConcept C21959979 @default.
- W1738382303 hasConcept C23123220 @default.
- W1738382303 hasConcept C2524010 @default.
- W1738382303 hasConcept C2776359362 @default.
- W1738382303 hasConcept C2777210771 @default.
- W1738382303 hasConcept C2778152352 @default.
- W1738382303 hasConcept C31972630 @default.
- W1738382303 hasConcept C33923547 @default.
- W1738382303 hasConcept C41008148 @default.
- W1738382303 hasConcept C60048249 @default.
- W1738382303 hasConcept C81669768 @default.
- W1738382303 hasConcept C94625758 @default.
- W1738382303 hasConceptScore W1738382303C106131492 @default.
- W1738382303 hasConceptScore W1738382303C113174947 @default.
- W1738382303 hasConceptScore W1738382303C134306372 @default.
- W1738382303 hasConceptScore W1738382303C136764020 @default.
- W1738382303 hasConceptScore W1738382303C137922610 @default.
- W1738382303 hasConceptScore W1738382303C154945302 @default.
- W1738382303 hasConceptScore W1738382303C17744445 @default.
- W1738382303 hasConceptScore W1738382303C195807954 @default.
- W1738382303 hasConceptScore W1738382303C199539241 @default.
- W1738382303 hasConceptScore W1738382303C21959979 @default.
- W1738382303 hasConceptScore W1738382303C23123220 @default.
- W1738382303 hasConceptScore W1738382303C2524010 @default.
- W1738382303 hasConceptScore W1738382303C2776359362 @default.
- W1738382303 hasConceptScore W1738382303C2777210771 @default.
- W1738382303 hasConceptScore W1738382303C2778152352 @default.
- W1738382303 hasConceptScore W1738382303C31972630 @default.
- W1738382303 hasConceptScore W1738382303C33923547 @default.
- W1738382303 hasConceptScore W1738382303C41008148 @default.
- W1738382303 hasConceptScore W1738382303C60048249 @default.
- W1738382303 hasConceptScore W1738382303C81669768 @default.
- W1738382303 hasConceptScore W1738382303C94625758 @default.
- W1738382303 hasLocation W17383823031 @default.
- W1738382303 hasLocation W17383823032 @default.
- W1738382303 hasOpenAccess W1738382303 @default.
- W1738382303 hasPrimaryLocation W17383823031 @default.
- W1738382303 hasRelatedWork W1551326255 @default.
- W1738382303 hasRelatedWork W1970061088 @default.
- W1738382303 hasRelatedWork W1970439844 @default.
- W1738382303 hasRelatedWork W1977746397 @default.
- W1738382303 hasRelatedWork W1991565788 @default.
- W1738382303 hasRelatedWork W1995855187 @default.
- W1738382303 hasRelatedWork W2142725362 @default.
- W1738382303 hasRelatedWork W2149964649 @default.
- W1738382303 hasRelatedWork W2167492315 @default.
- W1738382303 hasRelatedWork W2253768319 @default.
- W1738382303 hasRelatedWork W2295318420 @default.
- W1738382303 hasRelatedWork W2315064808 @default.
- W1738382303 hasRelatedWork W2355247546 @default.
- W1738382303 hasRelatedWork W2367031390 @default.
- W1738382303 hasRelatedWork W2384886272 @default.
- W1738382303 hasRelatedWork W2464591306 @default.
- W1738382303 hasRelatedWork W2510348187 @default.