Matches in SemOpenAlex for { <https://semopenalex.org/work/W4200306039> ?p ?o ?g. }
- W4200306039 endingPage "257" @default.
- W4200306039 startingPage "240" @default.
- W4200306039 abstract "Recent advances in deep learning have enabled the development of large-scale multimodal models for virtual screening and de novo molecular design. The human kinome with its abundant sequence and inhibitor data presents an attractive opportunity to develop proteochemometric models that exploit the size and internal diversity of this family of targets. Here, we challenge a standard practice in sequence-based affinity prediction models: instead of leveraging the full primary structure of proteins, each target is represented by a sequence of 29 discontiguous residues defining the ATP binding site. In kinase-ligand binding affinity prediction, our results show that the reduced active site sequence representation is not only computationally more efficient but consistently yields significantly higher performance than the full primary structure. This trend persists across different models, data sets, and performance metrics and holds true when predicting pIC50 for both unseen ligands and kinases. Our interpretability analysis reveals a potential explanation for the superiority of the active site models: whereas only mild statistical effects about the extraction of three-dimensional (3D) interaction sites take place in the full sequence models, the active site models are equipped with an implicit but strong inductive bias about the 3D structure stemming from the discontiguity of the active sites. Moreover, in direct comparisons, our models perform similarly or better than previous state-of-the-art approaches in affinity prediction. We then investigate a de novo molecular design task and find that the active site provides benefits in the computational efficiency, but otherwise, both kinase representations yield similar optimized affinities (for both SMILES- and SELFIES-based molecular generators). Our work challenges the assumption that the full primary structure is indispensable for modeling human kinases." @default.
- W4200306039 created "2021-12-31" @default.
- W4200306039 creator A5002562002 @default.
- W4200306039 creator A5004051017 @default.
- W4200306039 creator A5005561269 @default.
- W4200306039 creator A5044514479 @default.
- W4200306039 creator A5071633654 @default.
- W4200306039 date "2021-12-14" @default.
- W4200306039 modified "2023-10-02" @default.
- W4200306039 title "Active Site Sequence Representations of Human Kinases Outperform Full Sequence Representations for Affinity Prediction and Inhibitor Generation: 3D Effects in a 1D Model" @default.
- W4200306039 cites W1510052597 @default.
- W4200306039 cites W1837139339 @default.
- W4200306039 cites W1891334654 @default.
- W4200306039 cites W1918079945 @default.
- W4200306039 cites W1966285532 @default.
- W4200306039 cites W1971590501 @default.
- W4200306039 cites W1971606559 @default.
- W4200306039 cites W1975147762 @default.
- W4200306039 cites W1975875968 @default.
- W4200306039 cites W1988037271 @default.
- W4200306039 cites W1993438476 @default.
- W4200306039 cites W2005361085 @default.
- W4200306039 cites W2017254121 @default.
- W4200306039 cites W2030205108 @default.
- W4200306039 cites W2031441006 @default.
- W4200306039 cites W2042553090 @default.
- W4200306039 cites W2062289724 @default.
- W4200306039 cites W2138778824 @default.
- W4200306039 cites W2143210482 @default.
- W4200306039 cites W2204695023 @default.
- W4200306039 cites W2233361891 @default.
- W4200306039 cites W2471196942 @default.
- W4200306039 cites W2540540415 @default.
- W4200306039 cites W2714724074 @default.
- W4200306039 cites W2734982589 @default.
- W4200306039 cites W2784213390 @default.
- W4200306039 cites W2785947426 @default.
- W4200306039 cites W2790808809 @default.
- W4200306039 cites W2806117056 @default.
- W4200306039 cites W2806290811 @default.
- W4200306039 cites W2806437034 @default.
- W4200306039 cites W2806547269 @default.
- W4200306039 cites W2807792492 @default.
- W4200306039 cites W2809216727 @default.
- W4200306039 cites W2860192827 @default.
- W4200306039 cites W2869298098 @default.
- W4200306039 cites W2892653600 @default.
- W4200306039 cites W2912212024 @default.
- W4200306039 cites W2915792373 @default.
- W4200306039 cites W2918239264 @default.
- W4200306039 cites W2918335507 @default.
- W4200306039 cites W2939590817 @default.
- W4200306039 cites W2942173509 @default.
- W4200306039 cites W2945551948 @default.
- W4200306039 cites W2956961449 @default.
- W4200306039 cites W2964206522 @default.
- W4200306039 cites W2969996838 @default.
- W4200306039 cites W2971690404 @default.
- W4200306039 cites W2978484973 @default.
- W4200306039 cites W2985931096 @default.
- W4200306039 cites W2990099866 @default.
- W4200306039 cites W2997958114 @default.
- W4200306039 cites W3000369508 @default.
- W4200306039 cites W3019745511 @default.
- W4200306039 cites W3021453944 @default.
- W4200306039 cites W3024375027 @default.
- W4200306039 cites W3028589594 @default.
- W4200306039 cites W3029836473 @default.
- W4200306039 cites W3030285818 @default.
- W4200306039 cites W3037888463 @default.
- W4200306039 cites W3043461363 @default.
- W4200306039 cites W3045928028 @default.
- W4200306039 cites W3087418839 @default.
- W4200306039 cites W3098269892 @default.
- W4200306039 cites W3103171309 @default.
- W4200306039 cites W3104705366 @default.
- W4200306039 cites W3130227682 @default.
- W4200306039 cites W3133523400 @default.
- W4200306039 cites W3135935512 @default.
- W4200306039 cites W3158688028 @default.
- W4200306039 cites W3160420762 @default.
- W4200306039 cites W3161074857 @default.
- W4200306039 cites W3162155458 @default.
- W4200306039 cites W3163209783 @default.
- W4200306039 cites W3164680581 @default.
- W4200306039 cites W3186744718 @default.
- W4200306039 cites W3189729408 @default.
- W4200306039 cites W3203810337 @default.
- W4200306039 cites W4212883601 @default.
- W4200306039 cites W4230428500 @default.
- W4200306039 cites W4295216797 @default.
- W4200306039 doi "https://doi.org/10.1021/acs.jcim.1c00889" @default.
- W4200306039 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/34905358" @default.
- W4200306039 hasPublicationYear "2021" @default.
- W4200306039 type Work @default.
- W4200306039 citedByCount "11" @default.
- W4200306039 countsByYear W42003060392022 @default.
- W4200306039 countsByYear W42003060392023 @default.