Matches in SemOpenAlex for { <https://semopenalex.org/work/W106628021> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W106628021 abstract "Web page clustering is a fundamental technique to offer a solution for data management, information locating and its interpretation of Web data and to facilitate users for navigation, discrimination and understanding. Most existing clustering algorithms cannot adapt well to Web clustering directly in terms of efficiency and effectiveness. Combining contents analysis and hyperlink structure analysis has been proven a better approach. However, how to effectively combine the two features with different nature in clustering to get satisfactory results remains an open problem and there is still little work on it. In this paper, we present an experimental study on enhancing coupling of links and contents analysis of Web pages for robust clustering. In particular, we introduce two techniques: in-link reinforcement and anchor window analysis to improve the adaptability of contents-link coupled clustering. Our detailed evaluation indicates those techniques can effectively improve the quality of Web pages clustering for a wide range of topics. 1. Introduction there are more than 2 billion pages on the web without counting those so-called hidden Web pages that can be generated from the underneath databases. At the same time more than 100 million pages become obsolete every month. Locating truly needed Web pages and interpreting them appropriately is a big challenge faced by researchers in the fields of database, Information Retrieval (IR) and data mining. So, correctly clustering both the source Web pages and results of search engines is very important to help end users in navigation, discrimination, summarization and interpretation of the Web. Most existing and well-cited topic directories such as Yahoo! (www.yahoo.com) and open directory (www.dmoz.com) are mainly created and maintained manually by domain experts. Therefore those topic directories cover only a very small portion of the whole Web due to extremely low scalability of manual creating and maintenance. They are also more often outdated as the Web changes all the time. Some topics also have no corresponding sub-categories in Yahoo or open directory. Such unsatisfactory performance calls for the needs of semi-automatic or automatic clustering of Web pages that is expected to scale well and be able to follow the evolution of the Web well. Document clustering has been well studied in the field of tradition IR. The most commonly used techniques are developed under the vector-space model. Under this" @default.
- W106628021 created "2016-06-24" @default.
- W106628021 creator A5024982238 @default.
- W106628021 creator A5056438865 @default.
- W106628021 date "2004-01-01" @default.
- W106628021 modified "2023-09-22" @default.
- W106628021 title "Enhancing Contents-Link Coupled Web Page Clustering and Its Evaluation" @default.
- W106628021 cites W192724328 @default.
- W106628021 cites W1970859146 @default.
- W106628021 cites W1981202432 @default.
- W106628021 cites W1987777228 @default.
- W106628021 cites W1996764654 @default.
- W106628021 cites W2005207065 @default.
- W106628021 cites W2014351296 @default.
- W106628021 cites W2020423193 @default.
- W106628021 cites W2066636486 @default.
- W106628021 cites W2085950095 @default.
- W106628021 cites W2095150974 @default.
- W106628021 cites W2100958137 @default.
- W106628021 cites W2121996546 @default.
- W106628021 cites W2151626491 @default.
- W106628021 cites W2152565070 @default.
- W106628021 cites W2474187693 @default.
- W106628021 cites W2999729612 @default.
- W106628021 hasPublicationYear "2004" @default.
- W106628021 type Work @default.
- W106628021 sameAs 106628021 @default.
- W106628021 citedByCount "3" @default.
- W106628021 crossrefType "journal-article" @default.
- W106628021 hasAuthorship W106628021A5024982238 @default.
- W106628021 hasAuthorship W106628021A5056438865 @default.
- W106628021 hasConcept C124101348 @default.
- W106628021 hasConcept C130436687 @default.
- W106628021 hasConcept C136764020 @default.
- W106628021 hasConcept C154945302 @default.
- W106628021 hasConcept C162005631 @default.
- W106628021 hasConcept C170858558 @default.
- W106628021 hasConcept C173576120 @default.
- W106628021 hasConcept C195409031 @default.
- W106628021 hasConcept C197046077 @default.
- W106628021 hasConcept C21959979 @default.
- W106628021 hasConcept C23123220 @default.
- W106628021 hasConcept C24733836 @default.
- W106628021 hasConcept C30088001 @default.
- W106628021 hasConcept C41008148 @default.
- W106628021 hasConcept C61096286 @default.
- W106628021 hasConcept C73555534 @default.
- W106628021 hasConceptScore W106628021C124101348 @default.
- W106628021 hasConceptScore W106628021C130436687 @default.
- W106628021 hasConceptScore W106628021C136764020 @default.
- W106628021 hasConceptScore W106628021C154945302 @default.
- W106628021 hasConceptScore W106628021C162005631 @default.
- W106628021 hasConceptScore W106628021C170858558 @default.
- W106628021 hasConceptScore W106628021C173576120 @default.
- W106628021 hasConceptScore W106628021C195409031 @default.
- W106628021 hasConceptScore W106628021C197046077 @default.
- W106628021 hasConceptScore W106628021C21959979 @default.
- W106628021 hasConceptScore W106628021C23123220 @default.
- W106628021 hasConceptScore W106628021C24733836 @default.
- W106628021 hasConceptScore W106628021C30088001 @default.
- W106628021 hasConceptScore W106628021C41008148 @default.
- W106628021 hasConceptScore W106628021C61096286 @default.
- W106628021 hasConceptScore W106628021C73555534 @default.
- W106628021 hasLocation W1066280211 @default.
- W106628021 hasOpenAccess W106628021 @default.
- W106628021 hasPrimaryLocation W1066280211 @default.
- W106628021 hasRelatedWork W138719796 @default.
- W106628021 hasRelatedWork W1507164096 @default.
- W106628021 hasRelatedWork W1543293838 @default.
- W106628021 hasRelatedWork W1656596862 @default.
- W106628021 hasRelatedWork W16977880 @default.
- W106628021 hasRelatedWork W2072157417 @default.
- W106628021 hasRelatedWork W2087473891 @default.
- W106628021 hasRelatedWork W2111956124 @default.
- W106628021 hasRelatedWork W2119174985 @default.
- W106628021 hasRelatedWork W2123809929 @default.
- W106628021 hasRelatedWork W2152791334 @default.
- W106628021 hasRelatedWork W2182304450 @default.
- W106628021 hasRelatedWork W2236845972 @default.
- W106628021 hasRelatedWork W2315161676 @default.
- W106628021 hasRelatedWork W2402158498 @default.
- W106628021 hasRelatedWork W2555697458 @default.
- W106628021 hasRelatedWork W628551565 @default.
- W106628021 hasRelatedWork W2130393995 @default.
- W106628021 hasRelatedWork W2186006898 @default.
- W106628021 hasRelatedWork W2186888517 @default.
- W106628021 isParatext "false" @default.
- W106628021 isRetracted "false" @default.
- W106628021 magId "106628021" @default.
- W106628021 workType "article" @default.