Matches in SemOpenAlex for { <https://semopenalex.org/work/W2588006593> ?p ?o ?g. }
Showing items 1 to 79 of
79
with 100 items per page.
- W2588006593 endingPage "750" @default.
- W2588006593 startingPage "735" @default.
- W2588006593 abstract "New, purely vision-based, segmentation technique is formally described.Only a few simple visual cues are used to assess similarity of the rectangles.Its performance better by an order of magnitude when compared with competition.Rectangle clustering is a viable way to perform web page segmentation. This paper presents a novel approach to web page segmentation, which is one of substantial preprocessing steps when mining data from web documents. Most of the current segmentation methods are based on algorithms that work on a tree representation of web pages (DOM tree or a hierarchical rendering model) and produce another tree structure as an output.In contrast, our method uses a rendering engine to get an image of the web page, takes the smallest rendered elements of that image, performs clustering using a custom algorithm and produces a flat set of segments of a given granularity. For the clustering metrics, we use purely visual properties only: the distance of elements and their visual similarity.We experimentally evaluate the properties of our algorithm by processing 2400web pages. On this set of web pages, we prove that our algorithm is almost 90% faster than the reference algorithm. We also show that our algorithm accuracy is between 47% and 133% of the reference algorithm accuracy with indirect correlation of our algorithms accuracy to the depth of inspected page structure. In our experiments, we also demonstrate the advantages of producing a flat segmentation structure instead of an hierarchy." @default.
- W2588006593 created "2017-02-24" @default.
- W2588006593 creator A5040028014 @default.
- W2588006593 creator A5064793926 @default.
- W2588006593 creator A5083684509 @default.
- W2588006593 date "2017-05-01" @default.
- W2588006593 modified "2023-10-10" @default.
- W2588006593 title "Box clustering segmentation: A new method for vision-based web page preprocessing" @default.
- W2588006593 cites W1858861444 @default.
- W2588006593 cites W1974425785 @default.
- W2588006593 cites W1977746397 @default.
- W2588006593 cites W1982951827 @default.
- W2588006593 cites W2009759761 @default.
- W2588006593 cites W2015727453 @default.
- W2588006593 cites W2049488566 @default.
- W2588006593 cites W2057857570 @default.
- W2588006593 cites W2076910790 @default.
- W2588006593 cites W2109295713 @default.
- W2588006593 cites W2134907429 @default.
- W2588006593 cites W2160189941 @default.
- W2588006593 cites W2253768319 @default.
- W2588006593 cites W2400661088 @default.
- W2588006593 cites W4235169531 @default.
- W2588006593 doi "https://doi.org/10.1016/j.ipm.2017.02.002" @default.
- W2588006593 hasPublicationYear "2017" @default.
- W2588006593 type Work @default.
- W2588006593 sameAs 2588006593 @default.
- W2588006593 citedByCount "27" @default.
- W2588006593 countsByYear W25880065932018 @default.
- W2588006593 countsByYear W25880065932019 @default.
- W2588006593 countsByYear W25880065932020 @default.
- W2588006593 countsByYear W25880065932021 @default.
- W2588006593 countsByYear W25880065932022 @default.
- W2588006593 countsByYear W25880065932023 @default.
- W2588006593 crossrefType "journal-article" @default.
- W2588006593 hasAuthorship W2588006593A5040028014 @default.
- W2588006593 hasAuthorship W2588006593A5064793926 @default.
- W2588006593 hasAuthorship W2588006593A5083684509 @default.
- W2588006593 hasConcept C136764020 @default.
- W2588006593 hasConcept C153180895 @default.
- W2588006593 hasConcept C154945302 @default.
- W2588006593 hasConcept C21959979 @default.
- W2588006593 hasConcept C23123220 @default.
- W2588006593 hasConcept C31972630 @default.
- W2588006593 hasConcept C34736171 @default.
- W2588006593 hasConcept C41008148 @default.
- W2588006593 hasConcept C73555534 @default.
- W2588006593 hasConcept C89600930 @default.
- W2588006593 hasConceptScore W2588006593C136764020 @default.
- W2588006593 hasConceptScore W2588006593C153180895 @default.
- W2588006593 hasConceptScore W2588006593C154945302 @default.
- W2588006593 hasConceptScore W2588006593C21959979 @default.
- W2588006593 hasConceptScore W2588006593C23123220 @default.
- W2588006593 hasConceptScore W2588006593C31972630 @default.
- W2588006593 hasConceptScore W2588006593C34736171 @default.
- W2588006593 hasConceptScore W2588006593C41008148 @default.
- W2588006593 hasConceptScore W2588006593C73555534 @default.
- W2588006593 hasConceptScore W2588006593C89600930 @default.
- W2588006593 hasIssue "3" @default.
- W2588006593 hasLocation W25880065931 @default.
- W2588006593 hasOpenAccess W2588006593 @default.
- W2588006593 hasPrimaryLocation W25880065931 @default.
- W2588006593 hasRelatedWork W1669643531 @default.
- W2588006593 hasRelatedWork W2005437358 @default.
- W2588006593 hasRelatedWork W2008656436 @default.
- W2588006593 hasRelatedWork W2036075313 @default.
- W2588006593 hasRelatedWork W2039154422 @default.
- W2588006593 hasRelatedWork W2122581818 @default.
- W2588006593 hasRelatedWork W2134924024 @default.
- W2588006593 hasRelatedWork W2517104666 @default.
- W2588006593 hasRelatedWork W2895616727 @default.
- W2588006593 hasRelatedWork W2182382398 @default.
- W2588006593 hasVolume "53" @default.
- W2588006593 isParatext "false" @default.
- W2588006593 isRetracted "false" @default.
- W2588006593 magId "2588006593" @default.
- W2588006593 workType "article" @default.