Matches in SemOpenAlex for { <https://semopenalex.org/work/W2044012392> ?p ?o ?g. }
Showing items 1 to 95 of
95
with 100 items per page.
- W2044012392 abstract "Online news provides a convenient way for users to read novel news. Building online news corpus is important to many text mining and data mining issues. The creation of web news data required to construct a set of HTML parsing rules to identify content text. When a website rapidly change the layout style, the parsing rules (wrapper) should be reconstructed. In this paper, we address this issue and propose a news content recognition algorithm that is portable to different language and various domains. Our method first scans the entire HTML document and detects a set of candidate blocks. Second, the proposed weighted scoring function that combines stop word language models and HTML penalty functions is used to rank the importance of each candidate. We then check the block which obtains the highest score and a predefined threshold value. To validate the approach, we conduct experiments by using 533 online news HTML files from 24 web sites. The empirical study shows that our method achieves ~95% macro F-measure rate in recognizing news content." @default.
- W2044012392 created "2016-06-24" @default.
- W2044012392 creator A5041024748 @default.
- W2044012392 creator A5062807562 @default.
- W2044012392 date "2012-09-01" @default.
- W2044012392 modified "2023-09-27" @default.
- W2044012392 title "A Template Independent Method for Large Online News Content Extraction" @default.
- W2044012392 cites W1485852130 @default.
- W2044012392 cites W1551178608 @default.
- W2044012392 cites W1972594981 @default.
- W2044012392 cites W1975700640 @default.
- W2044012392 cites W1992657934 @default.
- W2044012392 cites W2015079391 @default.
- W2044012392 cites W2025494197 @default.
- W2044012392 cites W2037244798 @default.
- W2044012392 cites W2040574236 @default.
- W2044012392 cites W2061834489 @default.
- W2044012392 cites W2106819216 @default.
- W2044012392 cites W2109410256 @default.
- W2044012392 cites W2131462252 @default.
- W2044012392 cites W2152056009 @default.
- W2044012392 cites W2153635508 @default.
- W2044012392 cites W2160189941 @default.
- W2044012392 cites W2160819161 @default.
- W2044012392 cites W2168596788 @default.
- W2044012392 cites W2171364811 @default.
- W2044012392 cites W24642091 @default.
- W2044012392 cites W3117877 @default.
- W2044012392 doi "https://doi.org/10.1109/iiai-aai.2012.58" @default.
- W2044012392 hasPublicationYear "2012" @default.
- W2044012392 type Work @default.
- W2044012392 sameAs 2044012392 @default.
- W2044012392 citedByCount "2" @default.
- W2044012392 countsByYear W20440123922013 @default.
- W2044012392 countsByYear W20440123922018 @default.
- W2044012392 crossrefType "proceedings-article" @default.
- W2044012392 hasAuthorship W2044012392A5041024748 @default.
- W2044012392 hasAuthorship W2044012392A5062807562 @default.
- W2044012392 hasConcept C114614502 @default.
- W2044012392 hasConcept C124101348 @default.
- W2044012392 hasConcept C136764020 @default.
- W2044012392 hasConcept C138885662 @default.
- W2044012392 hasConcept C14036430 @default.
- W2044012392 hasConcept C164226766 @default.
- W2044012392 hasConcept C177264268 @default.
- W2044012392 hasConcept C186644900 @default.
- W2044012392 hasConcept C199360897 @default.
- W2044012392 hasConcept C204321447 @default.
- W2044012392 hasConcept C21959979 @default.
- W2044012392 hasConcept C23123220 @default.
- W2044012392 hasConcept C2780009758 @default.
- W2044012392 hasConcept C33923547 @default.
- W2044012392 hasConcept C41008148 @default.
- W2044012392 hasConcept C41895202 @default.
- W2044012392 hasConcept C78458016 @default.
- W2044012392 hasConcept C81639021 @default.
- W2044012392 hasConcept C86803240 @default.
- W2044012392 hasConcept C90805587 @default.
- W2044012392 hasConceptScore W2044012392C114614502 @default.
- W2044012392 hasConceptScore W2044012392C124101348 @default.
- W2044012392 hasConceptScore W2044012392C136764020 @default.
- W2044012392 hasConceptScore W2044012392C138885662 @default.
- W2044012392 hasConceptScore W2044012392C14036430 @default.
- W2044012392 hasConceptScore W2044012392C164226766 @default.
- W2044012392 hasConceptScore W2044012392C177264268 @default.
- W2044012392 hasConceptScore W2044012392C186644900 @default.
- W2044012392 hasConceptScore W2044012392C199360897 @default.
- W2044012392 hasConceptScore W2044012392C204321447 @default.
- W2044012392 hasConceptScore W2044012392C21959979 @default.
- W2044012392 hasConceptScore W2044012392C23123220 @default.
- W2044012392 hasConceptScore W2044012392C2780009758 @default.
- W2044012392 hasConceptScore W2044012392C33923547 @default.
- W2044012392 hasConceptScore W2044012392C41008148 @default.
- W2044012392 hasConceptScore W2044012392C41895202 @default.
- W2044012392 hasConceptScore W2044012392C78458016 @default.
- W2044012392 hasConceptScore W2044012392C81639021 @default.
- W2044012392 hasConceptScore W2044012392C86803240 @default.
- W2044012392 hasConceptScore W2044012392C90805587 @default.
- W2044012392 hasLocation W20440123921 @default.
- W2044012392 hasOpenAccess W2044012392 @default.
- W2044012392 hasPrimaryLocation W20440123921 @default.
- W2044012392 hasRelatedWork W1806995473 @default.
- W2044012392 hasRelatedWork W1992419927 @default.
- W2044012392 hasRelatedWork W2108475493 @default.
- W2044012392 hasRelatedWork W2167662847 @default.
- W2044012392 hasRelatedWork W2325114982 @default.
- W2044012392 hasRelatedWork W2411679502 @default.
- W2044012392 hasRelatedWork W2502722637 @default.
- W2044012392 hasRelatedWork W2575932452 @default.
- W2044012392 hasRelatedWork W1551406738 @default.
- W2044012392 hasRelatedWork W2594281132 @default.
- W2044012392 isParatext "false" @default.
- W2044012392 isRetracted "false" @default.
- W2044012392 magId "2044012392" @default.
- W2044012392 workType "article" @default.