Matches in SemOpenAlex for { <https://semopenalex.org/work/W2029345326> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W2029345326 abstract "Vertical search is an important topic in the design of search engines as it offers more abundant and more precise results on specific domain compared with large-scale search engines, like Google and Baidu. Prior to this paper, most vertical search engines were built using manually selected and edited materials, which was time and money consuming. In this paper, we propose a new information resource discovery model and build a crawler in the vertical search engine, which can selectively fetch webpages relevant to a pre-defined topic. The model includes three aspects. First, webpages are transformed into term vectors. TF-TUF , short for Term Frequency-Topic Unbalanced Factor , is proposed as the weighting schema in vector space model. In the schema,we put more weight on terms whose frequencies differ a lot among topics, which will contribute more in the topic prediction we believe. Second, we use Bayes method to predict the topics of the webpages, where topic labeled text is used for training in advance. The specific method about using Bayes to predict the topic is illustrated in the algorithm section. Third, we create a focused crawler using the topic prediction result. The prediction result is used not only to filter the irrelevant webpages but also to direct the crawler to the areas, which are most possible to be topic relevant. The whole three aspects work together to reach the goal of discovering the topic relevant materials on the web efficiently, in building a vertical search engine. Our experiment shows that the average prediction accuracy of our proposed model can reach more than 85%. For application, we also used the proposed model to build Search Engine for S&T (http://nstr.com.cn/search), a vertical search engine in science field." @default.
- W2029345326 created "2016-06-24" @default.
- W2029345326 creator A5017001198 @default.
- W2029345326 creator A5055446742 @default.
- W2029345326 date "2014-10-01" @default.
- W2029345326 modified "2023-09-24" @default.
- W2029345326 title "Bayes topic prediction model for focused crawling of vertical search engine" @default.
- W2029345326 cites W1599395313 @default.
- W2029345326 cites W1661523325 @default.
- W2029345326 cites W2009438923 @default.
- W2029345326 cites W2083089853 @default.
- W2029345326 cites W2097246321 @default.
- W2029345326 cites W2154159734 @default.
- W2029345326 cites W2170236196 @default.
- W2029345326 cites W2235107487 @default.
- W2029345326 cites W3101427803 @default.
- W2029345326 cites W60298443 @default.
- W2029345326 cites W633723564 @default.
- W2029345326 cites W2225462730 @default.
- W2029345326 doi "https://doi.org/10.1109/comcomap.2014.7017213" @default.
- W2029345326 hasPublicationYear "2014" @default.
- W2029345326 type Work @default.
- W2029345326 sameAs 2029345326 @default.
- W2029345326 citedByCount "0" @default.
- W2029345326 crossrefType "proceedings-article" @default.
- W2029345326 hasAuthorship W2029345326A5017001198 @default.
- W2029345326 hasAuthorship W2029345326A5055446742 @default.
- W2029345326 hasConcept C100368936 @default.
- W2029345326 hasConcept C105702510 @default.
- W2029345326 hasConcept C124101348 @default.
- W2029345326 hasConcept C136764020 @default.
- W2029345326 hasConcept C13743948 @default.
- W2029345326 hasConcept C14838553 @default.
- W2029345326 hasConcept C164120249 @default.
- W2029345326 hasConcept C173576120 @default.
- W2029345326 hasConcept C206345919 @default.
- W2029345326 hasConcept C21959979 @default.
- W2029345326 hasConcept C23123220 @default.
- W2029345326 hasConcept C31258907 @default.
- W2029345326 hasConcept C41008148 @default.
- W2029345326 hasConcept C61096286 @default.
- W2029345326 hasConcept C71924100 @default.
- W2029345326 hasConcept C73340581 @default.
- W2029345326 hasConcept C75165309 @default.
- W2029345326 hasConcept C97854310 @default.
- W2029345326 hasConceptScore W2029345326C100368936 @default.
- W2029345326 hasConceptScore W2029345326C105702510 @default.
- W2029345326 hasConceptScore W2029345326C124101348 @default.
- W2029345326 hasConceptScore W2029345326C136764020 @default.
- W2029345326 hasConceptScore W2029345326C13743948 @default.
- W2029345326 hasConceptScore W2029345326C14838553 @default.
- W2029345326 hasConceptScore W2029345326C164120249 @default.
- W2029345326 hasConceptScore W2029345326C173576120 @default.
- W2029345326 hasConceptScore W2029345326C206345919 @default.
- W2029345326 hasConceptScore W2029345326C21959979 @default.
- W2029345326 hasConceptScore W2029345326C23123220 @default.
- W2029345326 hasConceptScore W2029345326C31258907 @default.
- W2029345326 hasConceptScore W2029345326C41008148 @default.
- W2029345326 hasConceptScore W2029345326C61096286 @default.
- W2029345326 hasConceptScore W2029345326C71924100 @default.
- W2029345326 hasConceptScore W2029345326C73340581 @default.
- W2029345326 hasConceptScore W2029345326C75165309 @default.
- W2029345326 hasConceptScore W2029345326C97854310 @default.
- W2029345326 hasLocation W20293453261 @default.
- W2029345326 hasOpenAccess W2029345326 @default.
- W2029345326 hasPrimaryLocation W20293453261 @default.
- W2029345326 hasRelatedWork W1502604908 @default.
- W2029345326 hasRelatedWork W1827481849 @default.
- W2029345326 hasRelatedWork W1859338565 @default.
- W2029345326 hasRelatedWork W1958014849 @default.
- W2029345326 hasRelatedWork W1974306718 @default.
- W2029345326 hasRelatedWork W2103052078 @default.
- W2029345326 hasRelatedWork W2132107051 @default.
- W2029345326 hasRelatedWork W2144959234 @default.
- W2029345326 hasRelatedWork W2152057064 @default.
- W2029345326 hasRelatedWork W2355129779 @default.
- W2029345326 hasRelatedWork W2359166167 @default.
- W2029345326 hasRelatedWork W2366240838 @default.
- W2029345326 hasRelatedWork W2374021970 @default.
- W2029345326 hasRelatedWork W2538203441 @default.
- W2029345326 hasRelatedWork W2539245285 @default.
- W2029345326 hasRelatedWork W2548511917 @default.
- W2029345326 hasRelatedWork W2601972157 @default.
- W2029345326 hasRelatedWork W2888144270 @default.
- W2029345326 hasRelatedWork W2927916536 @default.
- W2029345326 hasRelatedWork W3142800404 @default.
- W2029345326 isParatext "false" @default.
- W2029345326 isRetracted "false" @default.
- W2029345326 magId "2029345326" @default.
- W2029345326 workType "article" @default.