Matches in SemOpenAlex for { <https://semopenalex.org/work/W2072326204> ?p ?o ?g. }
Showing items 1 to 83 of
83
with 100 items per page.
- W2072326204 endingPage "1293" @default.
- W2072326204 startingPage "1276" @default.
- W2072326204 abstract "In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of specific genres, including presentations, book chapters, technical papers, brochures, reports, and letters. Previously, methods have been proposed mainly for title extraction from research papers. It has not been clear whether it could be possible to conduct automatic title extraction from general documents. As a case study, we consider extraction from Office including Word and PowerPoint. In our approach, we annotate titles in sample documents (for Word and PowerPoint, respectively) and take them as training data, train machine learning models, and perform title extraction using the trained models. Our method is unique in that we mainly utilize formatting information such as font size as features in the models. It turns out that the use of formatting information can lead to quite accurate extraction from general documents. Precision and recall for title extraction from Word are 0.810 and 0.837, respectively, and precision and recall for title extraction from PowerPoint are 0.875 and 0.895, respectively in an experiment on intranet data. Other important new findings in this work include that we can train models in one domain and apply them to other domains, and more surprisingly we can even train models in one language and apply them to other languages. Moreover, we can significantly improve search ranking results in document retrieval by using the extracted titles." @default.
- W2072326204 created "2016-06-24" @default.
- W2072326204 creator A5011542448 @default.
- W2072326204 creator A5041083459 @default.
- W2072326204 creator A5042259471 @default.
- W2072326204 creator A5061320390 @default.
- W2072326204 creator A5076637714 @default.
- W2072326204 creator A5086275760 @default.
- W2072326204 date "2006-09-01" @default.
- W2072326204 modified "2023-09-27" @default.
- W2072326204 title "Automatic extraction of titles from general documents using machine learning" @default.
- W2072326204 cites W1528056001 @default.
- W2072326204 cites W1998839545 @default.
- W2072326204 cites W2008652694 @default.
- W2072326204 cites W2034797903 @default.
- W2072326204 cites W2038124402 @default.
- W2072326204 cites W2039231973 @default.
- W2072326204 cites W2046325278 @default.
- W2072326204 cites W2062159222 @default.
- W2072326204 cites W2080928448 @default.
- W2072326204 cites W2085030399 @default.
- W2072326204 cites W2123028711 @default.
- W2072326204 cites W2158511245 @default.
- W2072326204 cites W2158755884 @default.
- W2072326204 cites W4239510810 @default.
- W2072326204 cites W4253723135 @default.
- W2072326204 doi "https://doi.org/10.1016/j.ipm.2005.12.001" @default.
- W2072326204 hasPublicationYear "2006" @default.
- W2072326204 type Work @default.
- W2072326204 sameAs 2072326204 @default.
- W2072326204 citedByCount "46" @default.
- W2072326204 countsByYear W20723262042012 @default.
- W2072326204 countsByYear W20723262042013 @default.
- W2072326204 countsByYear W20723262042014 @default.
- W2072326204 countsByYear W20723262042015 @default.
- W2072326204 countsByYear W20723262042016 @default.
- W2072326204 countsByYear W20723262042017 @default.
- W2072326204 countsByYear W20723262042019 @default.
- W2072326204 countsByYear W20723262042020 @default.
- W2072326204 crossrefType "journal-article" @default.
- W2072326204 hasAuthorship W2072326204A5011542448 @default.
- W2072326204 hasAuthorship W2072326204A5041083459 @default.
- W2072326204 hasAuthorship W2072326204A5042259471 @default.
- W2072326204 hasAuthorship W2072326204A5061320390 @default.
- W2072326204 hasAuthorship W2072326204A5076637714 @default.
- W2072326204 hasAuthorship W2072326204A5086275760 @default.
- W2072326204 hasConcept C154945302 @default.
- W2072326204 hasConcept C185592680 @default.
- W2072326204 hasConcept C195807954 @default.
- W2072326204 hasConcept C204321447 @default.
- W2072326204 hasConcept C23123220 @default.
- W2072326204 hasConcept C41008148 @default.
- W2072326204 hasConcept C43617362 @default.
- W2072326204 hasConcept C4725764 @default.
- W2072326204 hasConceptScore W2072326204C154945302 @default.
- W2072326204 hasConceptScore W2072326204C185592680 @default.
- W2072326204 hasConceptScore W2072326204C195807954 @default.
- W2072326204 hasConceptScore W2072326204C204321447 @default.
- W2072326204 hasConceptScore W2072326204C23123220 @default.
- W2072326204 hasConceptScore W2072326204C41008148 @default.
- W2072326204 hasConceptScore W2072326204C43617362 @default.
- W2072326204 hasConceptScore W2072326204C4725764 @default.
- W2072326204 hasIssue "5" @default.
- W2072326204 hasLocation W20723262041 @default.
- W2072326204 hasOpenAccess W2072326204 @default.
- W2072326204 hasPrimaryLocation W20723262041 @default.
- W2072326204 hasRelatedWork W104581431 @default.
- W2072326204 hasRelatedWork W1548492051 @default.
- W2072326204 hasRelatedWork W1561729373 @default.
- W2072326204 hasRelatedWork W1601355022 @default.
- W2072326204 hasRelatedWork W1788528807 @default.
- W2072326204 hasRelatedWork W1975174578 @default.
- W2072326204 hasRelatedWork W2368651715 @default.
- W2072326204 hasRelatedWork W2393978999 @default.
- W2072326204 hasRelatedWork W2725657302 @default.
- W2072326204 hasRelatedWork W3107474891 @default.
- W2072326204 hasVolume "42" @default.
- W2072326204 isParatext "false" @default.
- W2072326204 isRetracted "false" @default.
- W2072326204 magId "2072326204" @default.
- W2072326204 workType "article" @default.