Matches in SemOpenAlex for { <https://semopenalex.org/work/W2372091777> ?p ?o ?g. }
Showing items 1 to 81 of
81
with 100 items per page.
- W2372091777 abstract "Most previous works on Web information extraction seldom use associations among Web page blocks.In order to solve this problem,this paper proposes an automatic Web news content extraction approach based on conditional random fields(CRFs).Firstly,it parses a target news page to a DOM tree.After eliminating invalid nodes,pruning subtrees and deleting single nodes in the tree,it uses heuristic rules to segment the DOM tree to blocks and converts these blocks into a data sequence.Then,it defines feature functions to extract each block's own state features and neighbor blocks' category transition features.Finally,by labeling the data sequence based on CRFs,it identifies each block's category to extract the page's content.Experimental results indicate that this approach is precise and adaptable for Web news content extraction,and importing associations among page blocks can improve Web news content extraction." @default.
- W2372091777 created "2016-06-24" @default.
- W2372091777 creator A5054572518 @default.
- W2372091777 date "2011-01-01" @default.
- W2372091777 modified "2023-09-23" @default.
- W2372091777 title "Automatic Web News Content Extraction Based on CRFs" @default.
- W2372091777 hasPublicationYear "2011" @default.
- W2372091777 type Work @default.
- W2372091777 sameAs 2372091777 @default.
- W2372091777 citedByCount "0" @default.
- W2372091777 crossrefType "journal-article" @default.
- W2372091777 hasAuthorship W2372091777A5054572518 @default.
- W2372091777 hasConcept C108010975 @default.
- W2372091777 hasConcept C113174947 @default.
- W2372091777 hasConcept C124101348 @default.
- W2372091777 hasConcept C134306372 @default.
- W2372091777 hasConcept C136764020 @default.
- W2372091777 hasConcept C137922610 @default.
- W2372091777 hasConcept C152565575 @default.
- W2372091777 hasConcept C154945302 @default.
- W2372091777 hasConcept C173801870 @default.
- W2372091777 hasConcept C195807954 @default.
- W2372091777 hasConcept C21959979 @default.
- W2372091777 hasConcept C23123220 @default.
- W2372091777 hasConcept C2524010 @default.
- W2372091777 hasConcept C2775953691 @default.
- W2372091777 hasConcept C2776324614 @default.
- W2372091777 hasConcept C2777210771 @default.
- W2372091777 hasConcept C33923547 @default.
- W2372091777 hasConcept C41008148 @default.
- W2372091777 hasConcept C6557445 @default.
- W2372091777 hasConcept C81639021 @default.
- W2372091777 hasConcept C86803240 @default.
- W2372091777 hasConceptScore W2372091777C108010975 @default.
- W2372091777 hasConceptScore W2372091777C113174947 @default.
- W2372091777 hasConceptScore W2372091777C124101348 @default.
- W2372091777 hasConceptScore W2372091777C134306372 @default.
- W2372091777 hasConceptScore W2372091777C136764020 @default.
- W2372091777 hasConceptScore W2372091777C137922610 @default.
- W2372091777 hasConceptScore W2372091777C152565575 @default.
- W2372091777 hasConceptScore W2372091777C154945302 @default.
- W2372091777 hasConceptScore W2372091777C173801870 @default.
- W2372091777 hasConceptScore W2372091777C195807954 @default.
- W2372091777 hasConceptScore W2372091777C21959979 @default.
- W2372091777 hasConceptScore W2372091777C23123220 @default.
- W2372091777 hasConceptScore W2372091777C2524010 @default.
- W2372091777 hasConceptScore W2372091777C2775953691 @default.
- W2372091777 hasConceptScore W2372091777C2776324614 @default.
- W2372091777 hasConceptScore W2372091777C2777210771 @default.
- W2372091777 hasConceptScore W2372091777C33923547 @default.
- W2372091777 hasConceptScore W2372091777C41008148 @default.
- W2372091777 hasConceptScore W2372091777C6557445 @default.
- W2372091777 hasConceptScore W2372091777C81639021 @default.
- W2372091777 hasConceptScore W2372091777C86803240 @default.
- W2372091777 hasLocation W23720917771 @default.
- W2372091777 hasOpenAccess W2372091777 @default.
- W2372091777 hasPrimaryLocation W23720917771 @default.
- W2372091777 hasRelatedWork W1489897710 @default.
- W2372091777 hasRelatedWork W2026345620 @default.
- W2372091777 hasRelatedWork W2029331575 @default.
- W2372091777 hasRelatedWork W2034607275 @default.
- W2372091777 hasRelatedWork W2093075751 @default.
- W2372091777 hasRelatedWork W2096482843 @default.
- W2372091777 hasRelatedWork W2119602842 @default.
- W2372091777 hasRelatedWork W2144966661 @default.
- W2372091777 hasRelatedWork W2161127713 @default.
- W2372091777 hasRelatedWork W2280422423 @default.
- W2372091777 hasRelatedWork W2316898980 @default.
- W2372091777 hasRelatedWork W2362238332 @default.
- W2372091777 hasRelatedWork W2367263001 @default.
- W2372091777 hasRelatedWork W2383771012 @default.
- W2372091777 hasRelatedWork W2389787676 @default.
- W2372091777 hasRelatedWork W2392681522 @default.
- W2372091777 hasRelatedWork W2393035056 @default.
- W2372091777 hasRelatedWork W2544597765 @default.
- W2372091777 hasRelatedWork W3013374318 @default.
- W2372091777 hasRelatedWork W2824776657 @default.
- W2372091777 isParatext "false" @default.
- W2372091777 isRetracted "false" @default.
- W2372091777 magId "2372091777" @default.
- W2372091777 workType "article" @default.