Matches in SemOpenAlex for { <https://semopenalex.org/work/W3138781613> ?p ?o ?g. }
Showing items 1 to 99 of
99
with 100 items per page.
- W3138781613 endingPage "1569" @default.
- W3138781613 startingPage "1560" @default.
- W3138781613 abstract "Simplified molecular input line entry system (SMILES)-based deep learning models are slowly emerging as an important research topic in cheminformatics. In this study, we introduce SMILES pair encoding (SPE), a data-driven tokenization algorithm. SPE first learns a vocabulary of high-frequency SMILES substrings from a large chemical dataset (e.g., ChEMBL) and then tokenizes SMILES based on the learned vocabulary for the actual training of deep learning models. SPE augments the widely used atom-level tokenization by adding human-readable and chemically explainable SMILES substrings as tokens. Case studies show that SPE can achieve superior performances on both molecular generation and quantitative structure–activity relationship (QSAR) prediction tasks. In particular, the SPE-based generative models outperformed the atom-level tokenization model in the aspects of novelty, diversity, and ability to resemble the training set distribution. The performance of SPE-based QSAR prediction models were evaluated using 24 benchmark datasets where SPE consistently either did match or outperform atom-level and k-mer tokenization. Therefore, SPE could be a promising tokenization method for SMILES-based deep learning models. An open-source Python package SmilesPE was developed to implement this algorithm and is now freely available at https://github.com/XinhaoLi74/SmilesPE." @default.
- W3138781613 created "2021-03-29" @default.
- W3138781613 creator A5048357912 @default.
- W3138781613 creator A5072029339 @default.
- W3138781613 date "2021-03-15" @default.
- W3138781613 modified "2023-10-05" @default.
- W3138781613 title "SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning" @default.
- W3138781613 cites W1832693441 @default.
- W3138781613 cites W1975147762 @default.
- W3138781613 cites W2008381136 @default.
- W3138781613 cites W2022476850 @default.
- W3138781613 cites W2038702914 @default.
- W3138781613 cites W2060531713 @default.
- W3138781613 cites W2096541451 @default.
- W3138781613 cites W2291927426 @default.
- W3138781613 cites W2578240541 @default.
- W3138781613 cites W2622206241 @default.
- W3138781613 cites W2790808809 @default.
- W3138781613 cites W2890097032 @default.
- W3138781613 cites W2914757825 @default.
- W3138781613 cites W2925830236 @default.
- W3138781613 cites W2945551948 @default.
- W3138781613 cites W2953128081 @default.
- W3138781613 cites W2962784628 @default.
- W3138781613 cites W2963026768 @default.
- W3138781613 cites W2964677890 @default.
- W3138781613 cites W2989615256 @default.
- W3138781613 cites W2997058986 @default.
- W3138781613 cites W3005353977 @default.
- W3138781613 cites W3006436762 @default.
- W3138781613 cites W3007309629 @default.
- W3138781613 cites W3023042104 @default.
- W3138781613 cites W3030978062 @default.
- W3138781613 cites W3032781902 @default.
- W3138781613 cites W3045928028 @default.
- W3138781613 cites W3174976929 @default.
- W3138781613 cites W4229590462 @default.
- W3138781613 cites W4245372219 @default.
- W3138781613 doi "https://doi.org/10.1021/acs.jcim.0c01127" @default.
- W3138781613 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/33715361" @default.
- W3138781613 hasPublicationYear "2021" @default.
- W3138781613 type Work @default.
- W3138781613 sameAs 3138781613 @default.
- W3138781613 citedByCount "33" @default.
- W3138781613 countsByYear W31387816132021 @default.
- W3138781613 countsByYear W31387816132022 @default.
- W3138781613 countsByYear W31387816132023 @default.
- W3138781613 crossrefType "journal-article" @default.
- W3138781613 hasAuthorship W3138781613A5048357912 @default.
- W3138781613 hasAuthorship W3138781613A5072029339 @default.
- W3138781613 hasBestOaLocation W31387816132 @default.
- W3138781613 hasConcept C11413529 @default.
- W3138781613 hasConcept C119857082 @default.
- W3138781613 hasConcept C154945302 @default.
- W3138781613 hasConcept C176982825 @default.
- W3138781613 hasConcept C177264268 @default.
- W3138781613 hasConcept C182407805 @default.
- W3138781613 hasConcept C199360897 @default.
- W3138781613 hasConcept C204321447 @default.
- W3138781613 hasConcept C41008148 @default.
- W3138781613 hasConcept C60644358 @default.
- W3138781613 hasConcept C68762167 @default.
- W3138781613 hasConcept C86803240 @default.
- W3138781613 hasConceptScore W3138781613C11413529 @default.
- W3138781613 hasConceptScore W3138781613C119857082 @default.
- W3138781613 hasConceptScore W3138781613C154945302 @default.
- W3138781613 hasConceptScore W3138781613C176982825 @default.
- W3138781613 hasConceptScore W3138781613C177264268 @default.
- W3138781613 hasConceptScore W3138781613C182407805 @default.
- W3138781613 hasConceptScore W3138781613C199360897 @default.
- W3138781613 hasConceptScore W3138781613C204321447 @default.
- W3138781613 hasConceptScore W3138781613C41008148 @default.
- W3138781613 hasConceptScore W3138781613C60644358 @default.
- W3138781613 hasConceptScore W3138781613C68762167 @default.
- W3138781613 hasConceptScore W3138781613C86803240 @default.
- W3138781613 hasFunder F4320332180 @default.
- W3138781613 hasFunder F4320338281 @default.
- W3138781613 hasIssue "4" @default.
- W3138781613 hasLocation W31387816131 @default.
- W3138781613 hasLocation W31387816132 @default.
- W3138781613 hasOpenAccess W3138781613 @default.
- W3138781613 hasPrimaryLocation W31387816131 @default.
- W3138781613 hasRelatedWork W1516839994 @default.
- W3138781613 hasRelatedWork W2048967369 @default.
- W3138781613 hasRelatedWork W2171830166 @default.
- W3138781613 hasRelatedWork W2355928363 @default.
- W3138781613 hasRelatedWork W2519019522 @default.
- W3138781613 hasRelatedWork W2961085424 @default.
- W3138781613 hasRelatedWork W3138781613 @default.
- W3138781613 hasRelatedWork W4286629047 @default.
- W3138781613 hasRelatedWork W4306674287 @default.
- W3138781613 hasRelatedWork W4224009465 @default.
- W3138781613 hasVolume "61" @default.
- W3138781613 isParatext "false" @default.
- W3138781613 isRetracted "false" @default.
- W3138781613 magId "3138781613" @default.
- W3138781613 workType "article" @default.