Matches in SemOpenAlex for { <https://semopenalex.org/work/W2145076056> ?p ?o ?g. }
Showing items 1 to 69 of
69
with 100 items per page.
- W2145076056 abstract "This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, in reality HTML titles are often bogus. It is desirable to conduct automatic extraction of titles from the bodies of HTML documents. This is an issue which does not seem to have been investigated previously. In this paper, we take a supervised machine learning approach to address the problem. We propose a specification on HTML titles. We utilize format information such as font size, position, and font weight as features in title extraction. Our method significantly outperforms the baseline method of using the lines in largest font size as title (20.9%-32.6% improvement in F1 score). As application, we consider web page retrieval. We use the TREC Web Track data for evaluation. We propose a new method for HTML documents retrieval using extracted titles. Experimental results indicate that the use of both extracted titles and title fields is almost always better than the use of title fields alone; the use of extracted titles is particularly helpful in the task of named page finding (23.1% -29.0% improvements)." @default.
- W2145076056 created "2016-06-24" @default.
- W2145076056 creator A5011542448 @default.
- W2145076056 creator A5015800533 @default.
- W2145076056 creator A5042259471 @default.
- W2145076056 creator A5055258592 @default.
- W2145076056 creator A5076447478 @default.
- W2145076056 creator A5086275760 @default.
- W2145076056 creator A5087920747 @default.
- W2145076056 date "2005-08-15" @default.
- W2145076056 modified "2023-09-26" @default.
- W2145076056 title "Title extraction from bodies of HTML documents and its application to web page retrieval" @default.
- W2145076056 cites W2015551056 @default.
- W2145076056 cites W2043440779 @default.
- W2145076056 cites W2083745421 @default.
- W2145076056 cites W2085030399 @default.
- W2145076056 cites W2085394232 @default.
- W2145076056 cites W2129595335 @default.
- W2145076056 cites W2153072229 @default.
- W2145076056 cites W2163915185 @default.
- W2145076056 cites W2167859982 @default.
- W2145076056 doi "https://doi.org/10.1145/1076034.1076079" @default.
- W2145076056 hasPublicationYear "2005" @default.
- W2145076056 type Work @default.
- W2145076056 sameAs 2145076056 @default.
- W2145076056 citedByCount "53" @default.
- W2145076056 countsByYear W21450760562012 @default.
- W2145076056 countsByYear W21450760562014 @default.
- W2145076056 countsByYear W21450760562015 @default.
- W2145076056 countsByYear W21450760562016 @default.
- W2145076056 countsByYear W21450760562017 @default.
- W2145076056 countsByYear W21450760562019 @default.
- W2145076056 countsByYear W21450760562020 @default.
- W2145076056 countsByYear W21450760562023 @default.
- W2145076056 crossrefType "proceedings-article" @default.
- W2145076056 hasAuthorship W2145076056A5011542448 @default.
- W2145076056 hasAuthorship W2145076056A5015800533 @default.
- W2145076056 hasAuthorship W2145076056A5042259471 @default.
- W2145076056 hasAuthorship W2145076056A5055258592 @default.
- W2145076056 hasAuthorship W2145076056A5076447478 @default.
- W2145076056 hasAuthorship W2145076056A5086275760 @default.
- W2145076056 hasAuthorship W2145076056A5087920747 @default.
- W2145076056 hasConcept C136764020 @default.
- W2145076056 hasConcept C21959979 @default.
- W2145076056 hasConcept C23123220 @default.
- W2145076056 hasConcept C41008148 @default.
- W2145076056 hasConcept C81639021 @default.
- W2145076056 hasConceptScore W2145076056C136764020 @default.
- W2145076056 hasConceptScore W2145076056C21959979 @default.
- W2145076056 hasConceptScore W2145076056C23123220 @default.
- W2145076056 hasConceptScore W2145076056C41008148 @default.
- W2145076056 hasConceptScore W2145076056C81639021 @default.
- W2145076056 hasLocation W21450760561 @default.
- W2145076056 hasOpenAccess W2145076056 @default.
- W2145076056 hasPrimaryLocation W21450760561 @default.
- W2145076056 hasRelatedWork W1563775010 @default.
- W2145076056 hasRelatedWork W2130855697 @default.
- W2145076056 hasRelatedWork W2144190808 @default.
- W2145076056 hasRelatedWork W2348417979 @default.
- W2145076056 hasRelatedWork W2373481072 @default.
- W2145076056 hasRelatedWork W2411679502 @default.
- W2145076056 hasRelatedWork W3216588747 @default.
- W2145076056 hasRelatedWork W67510309 @default.
- W2145076056 hasRelatedWork W2513545296 @default.
- W2145076056 hasRelatedWork W2592441986 @default.
- W2145076056 isParatext "false" @default.
- W2145076056 isRetracted "false" @default.
- W2145076056 magId "2145076056" @default.
- W2145076056 workType "article" @default.