Matches in SemOpenAlex for { <https://semopenalex.org/work/W4223993431> ?p ?o ?g. }
Showing items 1 to 71 of
71
with 100 items per page.
- W4223993431 abstract "Abstract Motivation: Application of chemical named entity recognition (CNER) algorithms allows retrieval of information from texts about chemical compound identifiers and creates associations with physical-chemical properties and biological activities. Scientific texts represent low-formalized sources of information. Most methods aimed at CNER are based on machine learning approaches, including conditional random fields and deep neural networks. In general, most machine learning approaches require either vector or sparse word representation of texts, which is produced using a fixed-size vocabulary from a corpus of texts. Chemical named entities (CNEs) constitute only a small fraction of the whole text, and the datasets used for training are highly imbalanced. Methods and results: We propose a new method for extracting CNEs from texts based on the naïve Bayes classifier. In contrast to the earlier developed CNER methods, our approach uses the representation of the data as a set of fragments of text (FoTs) with the subsequent preparation of a set of multi- n -grams (sequences from one to n symbols) for each FoT. Therefore, it does not require the usage of a predefined vocabulary. As a result, our approach may provide the recognition of novel CNEs. The average values of invariant accuracy (IA) range from 0.95 to 0.99 depending on the context window and the size values of n of the multi- n -grams. We applied the developed algorithm to the extracted CNEs of potential Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) main protease (Mpro) inhibitors. A set of CNEs corresponding to the chemical substances evaluated in the biochemical assays used for the discovery of Mpro inhibitors was retrieved. Manual analysis of the appropriate texts showed that CNEs of potential SARS-CoV-2 Mpro inhibitors were successfully identified by our method. Conclusion: The obtained results show that the proposed method can be used for filtering out words that are not related to CNEs; therefore, it can be successfully applied to the extraction of CNEs for the purposes of cheminformatics and medicinal chemistry." @default.
- W4223993431 created "2022-04-19" @default.
- W4223993431 creator A5029121475 @default.
- W4223993431 creator A5045408104 @default.
- W4223993431 creator A5052390470 @default.
- W4223993431 creator A5078075050 @default.
- W4223993431 date "2022-04-13" @default.
- W4223993431 modified "2023-10-18" @default.
- W4223993431 title "Chemical named entity recognition in the texts of scientific publications using the naïve Bayes classifier approach " @default.
- W4223993431 doi "https://doi.org/10.21203/rs.3.rs-1521134/v1" @default.
- W4223993431 hasPublicationYear "2022" @default.
- W4223993431 type Work @default.
- W4223993431 citedByCount "0" @default.
- W4223993431 crossrefType "posted-content" @default.
- W4223993431 hasAuthorship W4223993431A5029121475 @default.
- W4223993431 hasAuthorship W4223993431A5045408104 @default.
- W4223993431 hasAuthorship W4223993431A5052390470 @default.
- W4223993431 hasAuthorship W4223993431A5078075050 @default.
- W4223993431 hasBestOaLocation W42239934311 @default.
- W4223993431 hasConcept C119857082 @default.
- W4223993431 hasConcept C12267149 @default.
- W4223993431 hasConcept C138885662 @default.
- W4223993431 hasConcept C152565575 @default.
- W4223993431 hasConcept C153180895 @default.
- W4223993431 hasConcept C154504017 @default.
- W4223993431 hasConcept C154945302 @default.
- W4223993431 hasConcept C162324750 @default.
- W4223993431 hasConcept C187736073 @default.
- W4223993431 hasConcept C199360897 @default.
- W4223993431 hasConcept C204321447 @default.
- W4223993431 hasConcept C2777601683 @default.
- W4223993431 hasConcept C2779135771 @default.
- W4223993431 hasConcept C2780451532 @default.
- W4223993431 hasConcept C41008148 @default.
- W4223993431 hasConcept C41895202 @default.
- W4223993431 hasConcept C52001869 @default.
- W4223993431 hasConcept C95623464 @default.
- W4223993431 hasConceptScore W4223993431C119857082 @default.
- W4223993431 hasConceptScore W4223993431C12267149 @default.
- W4223993431 hasConceptScore W4223993431C138885662 @default.
- W4223993431 hasConceptScore W4223993431C152565575 @default.
- W4223993431 hasConceptScore W4223993431C153180895 @default.
- W4223993431 hasConceptScore W4223993431C154504017 @default.
- W4223993431 hasConceptScore W4223993431C154945302 @default.
- W4223993431 hasConceptScore W4223993431C162324750 @default.
- W4223993431 hasConceptScore W4223993431C187736073 @default.
- W4223993431 hasConceptScore W4223993431C199360897 @default.
- W4223993431 hasConceptScore W4223993431C204321447 @default.
- W4223993431 hasConceptScore W4223993431C2777601683 @default.
- W4223993431 hasConceptScore W4223993431C2779135771 @default.
- W4223993431 hasConceptScore W4223993431C2780451532 @default.
- W4223993431 hasConceptScore W4223993431C41008148 @default.
- W4223993431 hasConceptScore W4223993431C41895202 @default.
- W4223993431 hasConceptScore W4223993431C52001869 @default.
- W4223993431 hasConceptScore W4223993431C95623464 @default.
- W4223993431 hasLocation W42239934311 @default.
- W4223993431 hasOpenAccess W4223993431 @default.
- W4223993431 hasPrimaryLocation W42239934311 @default.
- W4223993431 hasRelatedWork W2041636156 @default.
- W4223993431 hasRelatedWork W2160451891 @default.
- W4223993431 hasRelatedWork W2539163683 @default.
- W4223993431 hasRelatedWork W2595988085 @default.
- W4223993431 hasRelatedWork W2947903144 @default.
- W4223993431 hasRelatedWork W2979979539 @default.
- W4223993431 hasRelatedWork W3127425528 @default.
- W4223993431 hasRelatedWork W3211499183 @default.
- W4223993431 hasRelatedWork W4205958290 @default.
- W4223993431 hasRelatedWork W4313549251 @default.
- W4223993431 isParatext "false" @default.
- W4223993431 isRetracted "false" @default.
- W4223993431 workType "article" @default.