Matches in SemOpenAlex for { <https://semopenalex.org/work/W2023851444> ?p ?o ?g. }
Showing items 1 to 89 of
89
with 100 items per page.
- W2023851444 abstract "We propose an approach for gathering web pages written in a specific language. The approach consists of a language predictor and a web site crawler. The language predictor is a machine learning based component that can learn from an example host graph some characteristics of relevant hosts, and is used to calculate the language degree of a web server whether it has a high probability to serve web pages written in a target language. The site crawler, on the other hand, chooses to download the web pages from a prioritized list of relevant servers. We have evaluated the crawling performance in terms of coverage and harvest rates. Preliminary experiments using a Thai web data set show a promising result, comparing with the traditional language-specific crawling methods recently proposed in the literatures." @default.
- W2023851444 created "2016-06-24" @default.
- W2023851444 creator A5033084294 @default.
- W2023851444 creator A5035414147 @default.
- W2023851444 creator A5076167953 @default.
- W2023851444 date "2010-09-01" @default.
- W2023851444 modified "2023-10-14" @default.
- W2023851444 title "A Machine Learning Based Language Specific Web Site Crawler" @default.
- W2023851444 cites W1489992655 @default.
- W2023851444 cites W1533946607 @default.
- W2023851444 cites W1538372379 @default.
- W2023851444 cites W1854214752 @default.
- W2023851444 cites W1952841266 @default.
- W2023851444 cites W1975051470 @default.
- W2023851444 cites W1988931372 @default.
- W2023851444 cites W2017726337 @default.
- W2023851444 cites W2058113710 @default.
- W2023851444 cites W2113184419 @default.
- W2023851444 cites W2128294438 @default.
- W2023851444 cites W2133990480 @default.
- W2023851444 cites W2150037085 @default.
- W2023851444 cites W2154527276 @default.
- W2023851444 cites W2169263924 @default.
- W2023851444 cites W2171183383 @default.
- W2023851444 doi "https://doi.org/10.1109/nbis.2010.25" @default.
- W2023851444 hasPublicationYear "2010" @default.
- W2023851444 type Work @default.
- W2023851444 sameAs 2023851444 @default.
- W2023851444 citedByCount "12" @default.
- W2023851444 countsByYear W20238514442013 @default.
- W2023851444 countsByYear W20238514442014 @default.
- W2023851444 countsByYear W20238514442015 @default.
- W2023851444 countsByYear W20238514442016 @default.
- W2023851444 countsByYear W20238514442017 @default.
- W2023851444 countsByYear W20238514442018 @default.
- W2023851444 countsByYear W20238514442019 @default.
- W2023851444 crossrefType "proceedings-article" @default.
- W2023851444 hasAuthorship W2023851444A5033084294 @default.
- W2023851444 hasAuthorship W2023851444A5035414147 @default.
- W2023851444 hasAuthorship W2023851444A5076167953 @default.
- W2023851444 hasConcept C100368936 @default.
- W2023851444 hasConcept C105702510 @default.
- W2023851444 hasConcept C110875604 @default.
- W2023851444 hasConcept C11392498 @default.
- W2023851444 hasConcept C136764020 @default.
- W2023851444 hasConcept C13743948 @default.
- W2023851444 hasConcept C154945302 @default.
- W2023851444 hasConcept C173576120 @default.
- W2023851444 hasConcept C204321447 @default.
- W2023851444 hasConcept C21959979 @default.
- W2023851444 hasConcept C23123220 @default.
- W2023851444 hasConcept C2780154274 @default.
- W2023851444 hasConcept C41008148 @default.
- W2023851444 hasConcept C61096286 @default.
- W2023851444 hasConcept C71924100 @default.
- W2023851444 hasConcept C73340581 @default.
- W2023851444 hasConceptScore W2023851444C100368936 @default.
- W2023851444 hasConceptScore W2023851444C105702510 @default.
- W2023851444 hasConceptScore W2023851444C110875604 @default.
- W2023851444 hasConceptScore W2023851444C11392498 @default.
- W2023851444 hasConceptScore W2023851444C136764020 @default.
- W2023851444 hasConceptScore W2023851444C13743948 @default.
- W2023851444 hasConceptScore W2023851444C154945302 @default.
- W2023851444 hasConceptScore W2023851444C173576120 @default.
- W2023851444 hasConceptScore W2023851444C204321447 @default.
- W2023851444 hasConceptScore W2023851444C21959979 @default.
- W2023851444 hasConceptScore W2023851444C23123220 @default.
- W2023851444 hasConceptScore W2023851444C2780154274 @default.
- W2023851444 hasConceptScore W2023851444C41008148 @default.
- W2023851444 hasConceptScore W2023851444C61096286 @default.
- W2023851444 hasConceptScore W2023851444C71924100 @default.
- W2023851444 hasConceptScore W2023851444C73340581 @default.
- W2023851444 hasLocation W20238514441 @default.
- W2023851444 hasOpenAccess W2023851444 @default.
- W2023851444 hasPrimaryLocation W20238514441 @default.
- W2023851444 hasRelatedWork W2051135816 @default.
- W2023851444 hasRelatedWork W2084927340 @default.
- W2023851444 hasRelatedWork W2100464657 @default.
- W2023851444 hasRelatedWork W2152505903 @default.
- W2023851444 hasRelatedWork W2161927007 @default.
- W2023851444 hasRelatedWork W2186697381 @default.
- W2023851444 hasRelatedWork W2277785728 @default.
- W2023851444 hasRelatedWork W2804548096 @default.
- W2023851444 hasRelatedWork W3164053708 @default.
- W2023851444 hasRelatedWork W3216588747 @default.
- W2023851444 isParatext "false" @default.
- W2023851444 isRetracted "false" @default.
- W2023851444 magId "2023851444" @default.
- W2023851444 workType "article" @default.