Matches in SemOpenAlex for { <https://semopenalex.org/work/W4306640168> ?p ?o ?g. }
- W4306640168 abstract "Identification of new chemical compounds with desired structural diversity and biological properties plays an essential role in drug discovery, yet the construction of such a potential space with elements of 'near-drug' properties is still a challenging task. In this work, we proposed a multimodal chemical information reconstruction system to automatically process, extract and align heterogeneous information from the text descriptions and structural images of chemical patents. Our key innovation lies in a heterogeneous data generator that produces cross-modality training data in the form of text descriptions and Markush structure images, from which a two-branch model with image- and text-processing units can then learn to both recognize heterogeneous chemical entities and simultaneously capture their correspondence. In particular, we have collected chemical structures from ChEMBL database and chemical patents from the European Patent Office and the US Patent and Trademark Office using keywords 'A61P, compound, structure' in the years from 2010 to 2020, and generated heterogeneous chemical information datasets with 210K structural images and 7818 annotated text snippets. Based on the reconstructed results and substituent replacement rules, structural libraries of a huge number of near-drug compounds can be generated automatically. In quantitative evaluations, our model can correctly reconstruct 97% of the molecular images into structured format and achieve an F1-score around 97-98% in the recognition of chemical entities, which demonstrated the effectiveness of our model in automatic information extraction from chemical patents, and hopefully transforming them to a user-friendly, structured molecular database enriching the near-drug space to realize the intelligent retrieval technology of chemical knowledge." @default.
- W4306640168 created "2022-10-18" @default.
- W4306640168 creator A5001077607 @default.
- W4306640168 creator A5011175914 @default.
- W4306640168 creator A5027738049 @default.
- W4306640168 creator A5030234495 @default.
- W4306640168 creator A5037220620 @default.
- W4306640168 creator A5039539407 @default.
- W4306640168 creator A5043148241 @default.
- W4306640168 creator A5054590783 @default.
- W4306640168 creator A5057508424 @default.
- W4306640168 creator A5088699320 @default.
- W4306640168 date "2022-10-17" @default.
- W4306640168 modified "2023-09-30" @default.
- W4306640168 title "Multi-modal chemical information reconstruction from images and texts for exploring the near-drug space" @default.
- W4306640168 cites W1966456689 @default.
- W4306640168 cites W1995089537 @default.
- W4306640168 cites W2001642682 @default.
- W4306640168 cites W2012115735 @default.
- W4306640168 cites W2017254234 @default.
- W4306640168 cites W2023818227 @default.
- W4306640168 cites W2035753075 @default.
- W4306640168 cites W2064675550 @default.
- W4306640168 cites W2080848531 @default.
- W4306640168 cites W2101553882 @default.
- W4306640168 cites W2136794542 @default.
- W4306640168 cites W2165671627 @default.
- W4306640168 cites W2171955717 @default.
- W4306640168 cites W2282821441 @default.
- W4306640168 cites W2523785361 @default.
- W4306640168 cites W2747592475 @default.
- W4306640168 cites W2751418808 @default.
- W4306640168 cites W2767891136 @default.
- W4306640168 cites W2768622420 @default.
- W4306640168 cites W2884561390 @default.
- W4306640168 cites W2902762889 @default.
- W4306640168 cites W2908837618 @default.
- W4306640168 cites W2911361106 @default.
- W4306640168 cites W2920995682 @default.
- W4306640168 cites W2962902328 @default.
- W4306640168 cites W3001449207 @default.
- W4306640168 cites W3043647281 @default.
- W4306640168 cites W3091684735 @default.
- W4306640168 cites W3097598035 @default.
- W4306640168 cites W3112559629 @default.
- W4306640168 cites W3138074128 @default.
- W4306640168 cites W3159789740 @default.
- W4306640168 cites W3213514782 @default.
- W4306640168 cites W4206029367 @default.
- W4306640168 cites W4212837331 @default.
- W4306640168 doi "https://doi.org/10.1093/bib/bbac461" @default.
- W4306640168 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/36252922" @default.
- W4306640168 hasPublicationYear "2022" @default.
- W4306640168 type Work @default.
- W4306640168 citedByCount "2" @default.
- W4306640168 countsByYear W43066401682023 @default.
- W4306640168 crossrefType "journal-article" @default.
- W4306640168 hasAuthorship W4306640168A5001077607 @default.
- W4306640168 hasAuthorship W4306640168A5011175914 @default.
- W4306640168 hasAuthorship W4306640168A5027738049 @default.
- W4306640168 hasAuthorship W4306640168A5030234495 @default.
- W4306640168 hasAuthorship W4306640168A5037220620 @default.
- W4306640168 hasAuthorship W4306640168A5039539407 @default.
- W4306640168 hasAuthorship W4306640168A5043148241 @default.
- W4306640168 hasAuthorship W4306640168A5054590783 @default.
- W4306640168 hasAuthorship W4306640168A5057508424 @default.
- W4306640168 hasAuthorship W4306640168A5088699320 @default.
- W4306640168 hasBestOaLocation W43066401681 @default.
- W4306640168 hasConcept C111919701 @default.
- W4306640168 hasConcept C124101348 @default.
- W4306640168 hasConcept C154945302 @default.
- W4306640168 hasConcept C195807954 @default.
- W4306640168 hasConcept C203394866 @default.
- W4306640168 hasConcept C23123220 @default.
- W4306640168 hasConcept C2778572836 @default.
- W4306640168 hasConcept C41008148 @default.
- W4306640168 hasConcept C60644358 @default.
- W4306640168 hasConcept C63222358 @default.
- W4306640168 hasConcept C68762167 @default.
- W4306640168 hasConcept C74187038 @default.
- W4306640168 hasConcept C77088390 @default.
- W4306640168 hasConcept C86803240 @default.
- W4306640168 hasConcept C99726746 @default.
- W4306640168 hasConceptScore W4306640168C111919701 @default.
- W4306640168 hasConceptScore W4306640168C124101348 @default.
- W4306640168 hasConceptScore W4306640168C154945302 @default.
- W4306640168 hasConceptScore W4306640168C195807954 @default.
- W4306640168 hasConceptScore W4306640168C203394866 @default.
- W4306640168 hasConceptScore W4306640168C23123220 @default.
- W4306640168 hasConceptScore W4306640168C2778572836 @default.
- W4306640168 hasConceptScore W4306640168C41008148 @default.
- W4306640168 hasConceptScore W4306640168C60644358 @default.
- W4306640168 hasConceptScore W4306640168C63222358 @default.
- W4306640168 hasConceptScore W4306640168C68762167 @default.
- W4306640168 hasConceptScore W4306640168C74187038 @default.
- W4306640168 hasConceptScore W4306640168C77088390 @default.
- W4306640168 hasConceptScore W4306640168C86803240 @default.
- W4306640168 hasConceptScore W4306640168C99726746 @default.
- W4306640168 hasFunder F4320321001 @default.
- W4306640168 hasFunder F4320336648 @default.