Matches in SemOpenAlex for { <https://semopenalex.org/work/W2155364607> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W2155364607 abstract "The FIASCO system implements a machine-learning approach for the automatic removal of boilerplate (navigation bars, link lists, page headers and footers, etc.) from Web pages in order to make them available as a clean and useful corpus for linguistic purposes. The system parses an HTML document into a DOM tree representation and identifies a set of disjoint subtrees that correspond to text blocks, headers or list items. Each block is then represented as a vector of linguistic, structural and visual features. A support vector machine classifier is used to distinguish between “clean” and “dirty” blocks. Dirty blocks are removed from the HTML tree before it is passed to the Lynx browser for conversion into plain text. The SVM classifier was trained and evaluated on a manually cleaned dataset of 158 English Web pages, the FIASCO gold standard." @default.
- W2155364607 created "2016-06-24" @default.
- W2155364607 creator A5004592732 @default.
- W2155364607 creator A5013414758 @default.
- W2155364607 creator A5014048772 @default.
- W2155364607 creator A5015040943 @default.
- W2155364607 creator A5016661742 @default.
- W2155364607 creator A5048713693 @default.
- W2155364607 creator A5049842774 @default.
- W2155364607 creator A5052805202 @default.
- W2155364607 creator A5054627290 @default.
- W2155364607 creator A5063945567 @default.
- W2155364607 creator A5067556973 @default.
- W2155364607 creator A5069645767 @default.
- W2155364607 creator A5087220281 @default.
- W2155364607 creator A5090070646 @default.
- W2155364607 date "2007-01-01" @default.
- W2155364607 modified "2023-09-27" @default.
- W2155364607 title "FIASCO: Filtering the Internet by Automatic Subtree Classification, Osnabr¨ uck" @default.
- W2155364607 cites W1356168 @default.
- W2155364607 cites W2153635508 @default.
- W2155364607 cites W2157963512 @default.
- W2155364607 cites W3023786531 @default.
- W2155364607 cites W3210232381 @default.
- W2155364607 hasPublicationYear "2007" @default.
- W2155364607 type Work @default.
- W2155364607 sameAs 2155364607 @default.
- W2155364607 citedByCount "2" @default.
- W2155364607 countsByYear W21553646072022 @default.
- W2155364607 crossrefType "journal-article" @default.
- W2155364607 hasAuthorship W2155364607A5004592732 @default.
- W2155364607 hasAuthorship W2155364607A5013414758 @default.
- W2155364607 hasAuthorship W2155364607A5014048772 @default.
- W2155364607 hasAuthorship W2155364607A5015040943 @default.
- W2155364607 hasAuthorship W2155364607A5016661742 @default.
- W2155364607 hasAuthorship W2155364607A5048713693 @default.
- W2155364607 hasAuthorship W2155364607A5049842774 @default.
- W2155364607 hasAuthorship W2155364607A5052805202 @default.
- W2155364607 hasAuthorship W2155364607A5054627290 @default.
- W2155364607 hasAuthorship W2155364607A5063945567 @default.
- W2155364607 hasAuthorship W2155364607A5067556973 @default.
- W2155364607 hasAuthorship W2155364607A5069645767 @default.
- W2155364607 hasAuthorship W2155364607A5087220281 @default.
- W2155364607 hasAuthorship W2155364607A5090070646 @default.
- W2155364607 hasConcept C110875604 @default.
- W2155364607 hasConcept C114614502 @default.
- W2155364607 hasConcept C12267149 @default.
- W2155364607 hasConcept C136764020 @default.
- W2155364607 hasConcept C154945302 @default.
- W2155364607 hasConcept C199360897 @default.
- W2155364607 hasConcept C204321447 @default.
- W2155364607 hasConcept C21959979 @default.
- W2155364607 hasConcept C23123220 @default.
- W2155364607 hasConcept C33923547 @default.
- W2155364607 hasConcept C41008148 @default.
- W2155364607 hasConcept C45340560 @default.
- W2155364607 hasConcept C75701414 @default.
- W2155364607 hasConcept C95623464 @default.
- W2155364607 hasConceptScore W2155364607C110875604 @default.
- W2155364607 hasConceptScore W2155364607C114614502 @default.
- W2155364607 hasConceptScore W2155364607C12267149 @default.
- W2155364607 hasConceptScore W2155364607C136764020 @default.
- W2155364607 hasConceptScore W2155364607C154945302 @default.
- W2155364607 hasConceptScore W2155364607C199360897 @default.
- W2155364607 hasConceptScore W2155364607C204321447 @default.
- W2155364607 hasConceptScore W2155364607C21959979 @default.
- W2155364607 hasConceptScore W2155364607C23123220 @default.
- W2155364607 hasConceptScore W2155364607C33923547 @default.
- W2155364607 hasConceptScore W2155364607C41008148 @default.
- W2155364607 hasConceptScore W2155364607C45340560 @default.
- W2155364607 hasConceptScore W2155364607C75701414 @default.
- W2155364607 hasConceptScore W2155364607C95623464 @default.
- W2155364607 hasLocation W21553646071 @default.
- W2155364607 hasOpenAccess W2155364607 @default.
- W2155364607 hasPrimaryLocation W21553646071 @default.
- W2155364607 hasRelatedWork W1488854785 @default.
- W2155364607 hasRelatedWork W1977746397 @default.
- W2155364607 hasRelatedWork W2010746350 @default.
- W2155364607 hasRelatedWork W2084886972 @default.
- W2155364607 hasRelatedWork W2094609208 @default.
- W2155364607 hasRelatedWork W2111689752 @default.
- W2155364607 hasRelatedWork W2124506639 @default.
- W2155364607 hasRelatedWork W2135812132 @default.
- W2155364607 hasRelatedWork W2149964649 @default.
- W2155364607 hasRelatedWork W2162449980 @default.
- W2155364607 hasRelatedWork W2163155321 @default.
- W2155364607 hasRelatedWork W2168358004 @default.
- W2155364607 hasRelatedWork W2171364811 @default.
- W2155364607 hasRelatedWork W2349251115 @default.
- W2155364607 hasRelatedWork W2355261386 @default.
- W2155364607 hasRelatedWork W2355888406 @default.
- W2155364607 hasRelatedWork W2464591306 @default.
- W2155364607 hasRelatedWork W3184312050 @default.
- W2155364607 hasRelatedWork W857958923 @default.
- W2155364607 hasRelatedWork W2478318369 @default.
- W2155364607 isParatext "false" @default.
- W2155364607 isRetracted "false" @default.
- W2155364607 magId "2155364607" @default.
- W2155364607 workType "article" @default.