Matches in SemOpenAlex for { <https://semopenalex.org/work/W2071542004> ?p ?o ?g. }
- W2071542004 abstract "Sitemaps designed by webmasters are not only presenting the main usage flows for users, but also organizing the hierarchical concept of the website. However, websites seldom provide sitemap pages to facilitate users to browse pages easily. Even provided, these sitemaps are not for machine-understanding, although few websites provide sitemaps with the XML format. In this paper, we develop a system, SiteMap Generator (SMG), to automatically generate the hierarchical sitemap for a website. SMG consists of five components. Sequence Translator translates a page's HTML source into a long sequence and then Page Partitioner splits the page into blocks based on analyzing the sequence complexity. Block Identifier categorizes each block into one of three block types: content, structure or redundant. Using the popular sequence searching tool, BLAST, Block Cluster calculates similarities between blocks so that blocks with similar functionalities are grouped and considered as candidate blocks for the sitemap. Finally, Hyperlink Analyzer transforms page-to-page links into block-to-block links and applies Kleinberg's HITS algorithm to estimate authority and hub values of each block. Block entropy value derived from features entropies is also used to improve the HITS. Several experiments on three websites: Mozilla, CNN and Yahoo! News, show that SMG is useful to partition a page into blocks (F1=86%), identify the block type (F1=85%), and generate the sitemap for the website (F1=63%)." @default.
- W2071542004 created "2016-06-24" @default.
- W2071542004 creator A5064942971 @default.
- W2071542004 creator A5082748932 @default.
- W2071542004 creator A5087560273 @default.
- W2071542004 date "2011-04-01" @default.
- W2071542004 modified "2023-09-26" @default.
- W2071542004 title "Automatic sitemaps generation: Exploring website structures using block extraction and hyperlink analysis" @default.
- W2071542004 cites W1489992655 @default.
- W2071542004 cites W1513332069 @default.
- W2071542004 cites W1553019137 @default.
- W2071542004 cites W1589993665 @default.
- W2071542004 cites W1854214752 @default.
- W2071542004 cites W1977001629 @default.
- W2071542004 cites W1981202432 @default.
- W2071542004 cites W1989338554 @default.
- W2071542004 cites W2005124845 @default.
- W2071542004 cites W2006119904 @default.
- W2071542004 cites W2019264297 @default.
- W2071542004 cites W2051916497 @default.
- W2071542004 cites W2055043387 @default.
- W2071542004 cites W2066636486 @default.
- W2071542004 cites W2079672501 @default.
- W2071542004 cites W2087064593 @default.
- W2071542004 cites W2106882534 @default.
- W2071542004 cites W2119966992 @default.
- W2071542004 cites W2124673015 @default.
- W2071542004 cites W2127648442 @default.
- W2071542004 cites W2136500370 @default.
- W2071542004 cites W2137079713 @default.
- W2071542004 cites W2140204390 @default.
- W2071542004 cites W2142660007 @default.
- W2071542004 cites W2143210482 @default.
- W2071542004 cites W2143733126 @default.
- W2071542004 cites W2146486820 @default.
- W2071542004 cites W2146990843 @default.
- W2071542004 cites W2167859982 @default.
- W2071542004 cites W2169347997 @default.
- W2071542004 cites W2171496353 @default.
- W2071542004 cites W2172000360 @default.
- W2071542004 doi "https://doi.org/10.1016/j.eswa.2010.09.056" @default.
- W2071542004 hasPublicationYear "2011" @default.
- W2071542004 type Work @default.
- W2071542004 sameAs 2071542004 @default.
- W2071542004 citedByCount "8" @default.
- W2071542004 countsByYear W20715420042013 @default.
- W2071542004 countsByYear W20715420042014 @default.
- W2071542004 countsByYear W20715420042016 @default.
- W2071542004 countsByYear W20715420042020 @default.
- W2071542004 countsByYear W20715420042021 @default.
- W2071542004 crossrefType "journal-article" @default.
- W2071542004 hasAuthorship W2071542004A5064942971 @default.
- W2071542004 hasAuthorship W2071542004A5082748932 @default.
- W2071542004 hasAuthorship W2071542004A5087560273 @default.
- W2071542004 hasConcept C114614502 @default.
- W2071542004 hasConcept C124101348 @default.
- W2071542004 hasConcept C136764020 @default.
- W2071542004 hasConcept C137922610 @default.
- W2071542004 hasConcept C154504017 @default.
- W2071542004 hasConcept C199360897 @default.
- W2071542004 hasConcept C21959979 @default.
- W2071542004 hasConcept C23123220 @default.
- W2071542004 hasConcept C2524010 @default.
- W2071542004 hasConcept C2777210771 @default.
- W2071542004 hasConcept C2778112365 @default.
- W2071542004 hasConcept C30088001 @default.
- W2071542004 hasConcept C33923547 @default.
- W2071542004 hasConcept C41008148 @default.
- W2071542004 hasConcept C42812 @default.
- W2071542004 hasConcept C54355233 @default.
- W2071542004 hasConcept C86803240 @default.
- W2071542004 hasConcept C8797682 @default.
- W2071542004 hasConceptScore W2071542004C114614502 @default.
- W2071542004 hasConceptScore W2071542004C124101348 @default.
- W2071542004 hasConceptScore W2071542004C136764020 @default.
- W2071542004 hasConceptScore W2071542004C137922610 @default.
- W2071542004 hasConceptScore W2071542004C154504017 @default.
- W2071542004 hasConceptScore W2071542004C199360897 @default.
- W2071542004 hasConceptScore W2071542004C21959979 @default.
- W2071542004 hasConceptScore W2071542004C23123220 @default.
- W2071542004 hasConceptScore W2071542004C2524010 @default.
- W2071542004 hasConceptScore W2071542004C2777210771 @default.
- W2071542004 hasConceptScore W2071542004C2778112365 @default.
- W2071542004 hasConceptScore W2071542004C30088001 @default.
- W2071542004 hasConceptScore W2071542004C33923547 @default.
- W2071542004 hasConceptScore W2071542004C41008148 @default.
- W2071542004 hasConceptScore W2071542004C42812 @default.
- W2071542004 hasConceptScore W2071542004C54355233 @default.
- W2071542004 hasConceptScore W2071542004C86803240 @default.
- W2071542004 hasConceptScore W2071542004C8797682 @default.
- W2071542004 hasLocation W20715420041 @default.
- W2071542004 hasOpenAccess W2071542004 @default.
- W2071542004 hasPrimaryLocation W20715420041 @default.
- W2071542004 hasRelatedWork W1500626914 @default.
- W2071542004 hasRelatedWork W1514708291 @default.
- W2071542004 hasRelatedWork W1556894713 @default.
- W2071542004 hasRelatedWork W1822474999 @default.
- W2071542004 hasRelatedWork W1974725911 @default.
- W2071542004 hasRelatedWork W2019264297 @default.
- W2071542004 hasRelatedWork W2034607275 @default.