Matches in SemOpenAlex for { <https://semopenalex.org/work/W2890174976> ?p ?o ?g. }
Showing items 1 to 74 of
74
with 100 items per page.
- W2890174976 abstract "The text in many web documents is organized into a hierarchy of section titles and corresponding prose content, a structure which provides potentially exploitable information on discourse structure and topicality. However, this organization is generally discarded during text collection, and collecting it is not straightforward: the same visual organization can be implemented in a myriad of different ways in the underlying HTML. To remedy this, we present a flexible system for automatically extracting the hierarchical section titles and prose organization of web documents irrespective of differences in HTML representation. This system uses features from syntax, semantics, discourse and markup to build two models which classify HTML text into section titles and prose text. When tested on three different domains of web text, our domain-independent system achieves an overall precision of 0.82 and a recall of 0.98. The domain-dependent variation produces very high precision (0.99) at the expense of recall (0.75). These results exhibit a robust level of accuracy suitable for enhancing question answering, information extraction, and summarization." @default.
- W2890174976 created "2018-09-27" @default.
- W2890174976 creator A5056978982 @default.
- W2890174976 creator A5061450676 @default.
- W2890174976 creator A5081563886 @default.
- W2890174976 date "2018-01-01" @default.
- W2890174976 modified "2023-10-17" @default.
- W2890174976 title "Supervised and Unsupervised Methods for Robust Separation of Section Titles and Prose Text in Web Documents" @default.
- W2890174976 cites W1496244604 @default.
- W2890174976 cites W1553229631 @default.
- W2890174976 cites W1567298333 @default.
- W2890174976 cites W1675962973 @default.
- W2890174976 cites W168564468 @default.
- W2890174976 cites W205532704 @default.
- W2890174976 cites W2080132606 @default.
- W2890174976 cites W2101234009 @default.
- W2890174976 cites W2123442489 @default.
- W2890174976 cites W2133990480 @default.
- W2890174976 cites W2250601901 @default.
- W2890174976 cites W2252264672 @default.
- W2890174976 cites W2271840356 @default.
- W2890174976 cites W2402929825 @default.
- W2890174976 cites W2962837431 @default.
- W2890174976 cites W757417731 @default.
- W2890174976 doi "https://doi.org/10.18653/v1/d18-1099" @default.
- W2890174976 hasPublicationYear "2018" @default.
- W2890174976 type Work @default.
- W2890174976 sameAs 2890174976 @default.
- W2890174976 citedByCount "12" @default.
- W2890174976 countsByYear W28901749762019 @default.
- W2890174976 countsByYear W28901749762020 @default.
- W2890174976 countsByYear W28901749762021 @default.
- W2890174976 countsByYear W28901749762022 @default.
- W2890174976 countsByYear W28901749762023 @default.
- W2890174976 crossrefType "proceedings-article" @default.
- W2890174976 hasAuthorship W2890174976A5056978982 @default.
- W2890174976 hasAuthorship W2890174976A5061450676 @default.
- W2890174976 hasAuthorship W2890174976A5081563886 @default.
- W2890174976 hasBestOaLocation W28901749761 @default.
- W2890174976 hasConcept C111919701 @default.
- W2890174976 hasConcept C119857082 @default.
- W2890174976 hasConcept C136764020 @default.
- W2890174976 hasConcept C154945302 @default.
- W2890174976 hasConcept C204321447 @default.
- W2890174976 hasConcept C23123220 @default.
- W2890174976 hasConcept C2776061190 @default.
- W2890174976 hasConcept C2780129039 @default.
- W2890174976 hasConcept C41008148 @default.
- W2890174976 hasConceptScore W2890174976C111919701 @default.
- W2890174976 hasConceptScore W2890174976C119857082 @default.
- W2890174976 hasConceptScore W2890174976C136764020 @default.
- W2890174976 hasConceptScore W2890174976C154945302 @default.
- W2890174976 hasConceptScore W2890174976C204321447 @default.
- W2890174976 hasConceptScore W2890174976C23123220 @default.
- W2890174976 hasConceptScore W2890174976C2776061190 @default.
- W2890174976 hasConceptScore W2890174976C2780129039 @default.
- W2890174976 hasConceptScore W2890174976C41008148 @default.
- W2890174976 hasLocation W28901749761 @default.
- W2890174976 hasOpenAccess W2890174976 @default.
- W2890174976 hasPrimaryLocation W28901749761 @default.
- W2890174976 hasRelatedWork W2101955803 @default.
- W2890174976 hasRelatedWork W2119214692 @default.
- W2890174976 hasRelatedWork W2144190808 @default.
- W2890174976 hasRelatedWork W2151447942 @default.
- W2890174976 hasRelatedWork W2357241418 @default.
- W2890174976 hasRelatedWork W2366644548 @default.
- W2890174976 hasRelatedWork W2376314740 @default.
- W2890174976 hasRelatedWork W2384888906 @default.
- W2890174976 hasRelatedWork W2611614995 @default.
- W2890174976 hasRelatedWork W2748952813 @default.
- W2890174976 isParatext "false" @default.
- W2890174976 isRetracted "false" @default.
- W2890174976 magId "2890174976" @default.
- W2890174976 workType "article" @default.