Matches in SemOpenAlex for { <https://semopenalex.org/work/W2616996263> ?p ?o ?g. }
Showing items 1 to 65 of
65
with 100 items per page.
- W2616996263 abstract "Background. There is huge amount of full-text biomedical literatures available in public repositories like PubMed Central (PMC). However, a substantial number of the papers are in Portable Document Format (PDF) and do not provide plain text format ready for text mining and natural language processing (NLP). Although there exist many PDF-to-text converters, they still suffer from several challenges while processing biomedical PDFs, such as the correct transcription of titles/abstracts, segmenting references/acknowledgements, special characters, jumbling errors (the wrong order of the text), and word boundaries. Methods. In this paper, we present bioPDFX, a novel tool which complements weaknesses with strengths of multiple state-of-the-art methods and then applies machine learning methods to address all issues above Results. The experiment results on publications of Genome Wide Association Studies (GWAS) demonstrated that bioPDFX significantly improved the quality of XML comparing to state-of-the-art PDF-to-XML converter, leading to a biomedical database more suitable for text mining. Discussion. Overall, the whole pipeline developed in this paper makes the published literature in form of PDF files much better suited for text mining tasks, while slightly improving the overall text quality as well. The service is open to access freely at URL: http://textmining.ucsd.edu:9000 . A list of PubMed Central IDs of the 941 articles (see Supplemental File 1) used in this study is available for download at the same URL. The instructions of how to run the service with a PubMed ID are described in Supplemental File 2." @default.
- W2616996263 created "2017-06-05" @default.
- W2616996263 creator A5018789011 @default.
- W2616996263 creator A5035414075 @default.
- W2616996263 creator A5040580626 @default.
- W2616996263 creator A5041141821 @default.
- W2616996263 creator A5046477105 @default.
- W2616996263 creator A5047894920 @default.
- W2616996263 date "2017-05-26" @default.
- W2616996263 modified "2023-09-23" @default.
- W2616996263 title "bioPDFX: preparing PDF scientific articles for biomedical text mining" @default.
- W2616996263 cites W1487940167 @default.
- W2616996263 cites W1647671624 @default.
- W2616996263 cites W1991133427 @default.
- W2616996263 cites W2001642682 @default.
- W2616996263 cites W2007321142 @default.
- W2616996263 cites W2066043610 @default.
- W2616996263 cites W2080051508 @default.
- W2616996263 cites W2096525273 @default.
- W2616996263 cites W2116868464 @default.
- W2616996263 cites W2117446594 @default.
- W2616996263 cites W2288271880 @default.
- W2616996263 cites W2294385208 @default.
- W2616996263 cites W99399284 @default.
- W2616996263 doi "https://doi.org/10.7287/peerj.preprints.2993v1" @default.
- W2616996263 hasPublicationYear "2017" @default.
- W2616996263 type Work @default.
- W2616996263 sameAs 2616996263 @default.
- W2616996263 citedByCount "0" @default.
- W2616996263 crossrefType "posted-content" @default.
- W2616996263 hasAuthorship W2616996263A5018789011 @default.
- W2616996263 hasAuthorship W2616996263A5035414075 @default.
- W2616996263 hasAuthorship W2616996263A5040580626 @default.
- W2616996263 hasAuthorship W2616996263A5041141821 @default.
- W2616996263 hasAuthorship W2616996263A5046477105 @default.
- W2616996263 hasAuthorship W2616996263A5047894920 @default.
- W2616996263 hasBestOaLocation W26169962631 @default.
- W2616996263 hasConcept C127413603 @default.
- W2616996263 hasConcept C2522767166 @default.
- W2616996263 hasConcept C41008148 @default.
- W2616996263 hasConcept C55587333 @default.
- W2616996263 hasConceptScore W2616996263C127413603 @default.
- W2616996263 hasConceptScore W2616996263C2522767166 @default.
- W2616996263 hasConceptScore W2616996263C41008148 @default.
- W2616996263 hasConceptScore W2616996263C55587333 @default.
- W2616996263 hasLocation W26169962631 @default.
- W2616996263 hasOpenAccess W2616996263 @default.
- W2616996263 hasPrimaryLocation W26169962631 @default.
- W2616996263 hasRelatedWork W118520769 @default.
- W2616996263 hasRelatedWork W1553529581 @default.
- W2616996263 hasRelatedWork W2071736301 @default.
- W2616996263 hasRelatedWork W2398152448 @default.
- W2616996263 hasRelatedWork W2502073295 @default.
- W2616996263 hasRelatedWork W2917244678 @default.
- W2616996263 hasRelatedWork W2972097333 @default.
- W2616996263 hasRelatedWork W3043082703 @default.
- W2616996263 hasRelatedWork W3135369118 @default.
- W2616996263 hasRelatedWork W3158495297 @default.
- W2616996263 hasRelatedWork W3204870073 @default.
- W2616996263 hasRelatedWork W9328536 @default.
- W2616996263 hasRelatedWork W3125926674 @default.
- W2616996263 isParatext "false" @default.
- W2616996263 isRetracted "false" @default.
- W2616996263 magId "2616996263" @default.
- W2616996263 workType "article" @default.