Matches in SemOpenAlex for { <https://semopenalex.org/work/W2018022579> ?p ?o ?g. }
- W2018022579 endingPage "512" @default.
- W2018022579 startingPage "499" @default.
- W2018022579 abstract "Webpages are mainly distinguished by their topic (e.g., politics, sports etc.) and genre (e.g., blogs, homepages, e-shops, etc.). Automatic detection of webpage genre could considerably enhance the ability of modern search engines to focus on the requirements of the user’s information need. In this paper, we present an approach to webpage genre detection based on a fully-automated extraction of the feature set that represents the style of webpages. The features we propose (character n -grams of variable length and HTML tags) are language-independent and easily-extracted while they can be adapted to the properties of the still evolving web genres and the noisy environment of the web. Experiments based on two publicly-available corpora show that the performance of the proposed approach is superior in comparison to previously reported results. It is also shown that character n -grams are better features than words when the dimensionality increases while the binary representation is more effective than the term-frequency representation for both feature types. Moreover, we perform a series of cross-check experiments (e.g., training using a genre palette and testing using a different genre palette as well as using the features extracted from one corpus to discriminate the genres of the other corpus) to illustrate the robustness of our approach and its ability to capture the general stylistic properties of genre categories even when the feature set is not optimized for the given corpus." @default.
- W2018022579 created "2016-06-24" @default.
- W2018022579 creator A5000360020 @default.
- W2018022579 creator A5061737423 @default.
- W2018022579 date "2009-09-01" @default.
- W2018022579 modified "2023-09-23" @default.
- W2018022579 title "Learning to recognize webpage genres" @default.
- W2018022579 cites W1500895378 @default.
- W2018022579 cites W1568297139 @default.
- W2018022579 cites W1997083791 @default.
- W2018022579 cites W2027582570 @default.
- W2018022579 cites W2051391088 @default.
- W2018022579 cites W2071238250 @default.
- W2018022579 cites W2098181468 @default.
- W2018022579 cites W2107682024 @default.
- W2018022579 cites W2117190086 @default.
- W2018022579 cites W2118020653 @default.
- W2018022579 cites W2118201046 @default.
- W2018022579 cites W2124890966 @default.
- W2018022579 cites W2138326572 @default.
- W2018022579 cites W2141865154 @default.
- W2018022579 cites W2146888100 @default.
- W2018022579 cites W2149684865 @default.
- W2018022579 cites W2155085300 @default.
- W2018022579 cites W4238608053 @default.
- W2018022579 doi "https://doi.org/10.1016/j.ipm.2009.05.003" @default.
- W2018022579 hasPublicationYear "2009" @default.
- W2018022579 type Work @default.
- W2018022579 sameAs 2018022579 @default.
- W2018022579 citedByCount "41" @default.
- W2018022579 countsByYear W20180225792012 @default.
- W2018022579 countsByYear W20180225792013 @default.
- W2018022579 countsByYear W20180225792014 @default.
- W2018022579 countsByYear W20180225792015 @default.
- W2018022579 countsByYear W20180225792016 @default.
- W2018022579 countsByYear W20180225792018 @default.
- W2018022579 countsByYear W20180225792019 @default.
- W2018022579 countsByYear W20180225792020 @default.
- W2018022579 countsByYear W20180225792021 @default.
- W2018022579 countsByYear W20180225792022 @default.
- W2018022579 countsByYear W20180225792023 @default.
- W2018022579 crossrefType "journal-article" @default.
- W2018022579 hasAuthorship W2018022579A5000360020 @default.
- W2018022579 hasAuthorship W2018022579A5061737423 @default.
- W2018022579 hasConcept C111919701 @default.
- W2018022579 hasConcept C120665830 @default.
- W2018022579 hasConcept C121332964 @default.
- W2018022579 hasConcept C136764020 @default.
- W2018022579 hasConcept C138885662 @default.
- W2018022579 hasConcept C154945302 @default.
- W2018022579 hasConcept C177264268 @default.
- W2018022579 hasConcept C17744445 @default.
- W2018022579 hasConcept C192209626 @default.
- W2018022579 hasConcept C199360897 @default.
- W2018022579 hasConcept C199539241 @default.
- W2018022579 hasConcept C204321447 @default.
- W2018022579 hasConcept C21959979 @default.
- W2018022579 hasConcept C23123220 @default.
- W2018022579 hasConcept C2776359362 @default.
- W2018022579 hasConcept C2776401178 @default.
- W2018022579 hasConcept C2779674283 @default.
- W2018022579 hasConcept C41008148 @default.
- W2018022579 hasConcept C41895202 @default.
- W2018022579 hasConcept C52622490 @default.
- W2018022579 hasConcept C94625758 @default.
- W2018022579 hasConceptScore W2018022579C111919701 @default.
- W2018022579 hasConceptScore W2018022579C120665830 @default.
- W2018022579 hasConceptScore W2018022579C121332964 @default.
- W2018022579 hasConceptScore W2018022579C136764020 @default.
- W2018022579 hasConceptScore W2018022579C138885662 @default.
- W2018022579 hasConceptScore W2018022579C154945302 @default.
- W2018022579 hasConceptScore W2018022579C177264268 @default.
- W2018022579 hasConceptScore W2018022579C17744445 @default.
- W2018022579 hasConceptScore W2018022579C192209626 @default.
- W2018022579 hasConceptScore W2018022579C199360897 @default.
- W2018022579 hasConceptScore W2018022579C199539241 @default.
- W2018022579 hasConceptScore W2018022579C204321447 @default.
- W2018022579 hasConceptScore W2018022579C21959979 @default.
- W2018022579 hasConceptScore W2018022579C23123220 @default.
- W2018022579 hasConceptScore W2018022579C2776359362 @default.
- W2018022579 hasConceptScore W2018022579C2776401178 @default.
- W2018022579 hasConceptScore W2018022579C2779674283 @default.
- W2018022579 hasConceptScore W2018022579C41008148 @default.
- W2018022579 hasConceptScore W2018022579C41895202 @default.
- W2018022579 hasConceptScore W2018022579C52622490 @default.
- W2018022579 hasConceptScore W2018022579C94625758 @default.
- W2018022579 hasIssue "5" @default.
- W2018022579 hasLocation W20180225791 @default.
- W2018022579 hasOpenAccess W2018022579 @default.
- W2018022579 hasPrimaryLocation W20180225791 @default.
- W2018022579 hasRelatedWork W1488266984 @default.
- W2018022579 hasRelatedWork W1509467138 @default.
- W2018022579 hasRelatedWork W2128719260 @default.
- W2018022579 hasRelatedWork W2411679502 @default.
- W2018022579 hasRelatedWork W2916492174 @default.
- W2018022579 hasRelatedWork W3107474891 @default.
- W2018022579 hasRelatedWork W4210656569 @default.
- W2018022579 hasRelatedWork W50774052 @default.