Matches in SemOpenAlex for { <https://semopenalex.org/work/W4225415057> ?p ?o ?g. }
- W4225415057 abstract "Abstract The identification of drug/compound-target interactions (DTIs) constitutes the basis of drug discovery, for which computational predictive approaches have been applied. As a relatively new data-driven paradigm, proteochemometric (PCM) modeling utilizes both protein and compound properties as a pair at the input level and processes them via statistical/machine learning. The representation of input samples (i.e., proteins and their ligands) in the form of quantitative feature vectors is crucial for the extraction of interaction-related properties during the artificial learning and subsequent prediction of DTIs. Lately, the representation learning approach, in which input samples are automatically featurized via training and applying a machine/deep learning model, has been utilized in biomedical sciences. In this study, we performed a comprehensive investigation of different computational approaches/techniques for data preparation and protein featurization, including both conventional approaches and the novel learned embeddings, with the aim of achieving better data representations and more successful learning in PCM-based DTI prediction. For this, we first constructed realistic and challenging benchmark datasets on small, medium, and large scales to be used as reliable gold standards for specific DTI modeling tasks. We developed and applied a network analysis-based splitting strategy to divide datasets into structurally different training and test folds. Using these datasets together with various featurization methods, we trained and tested DTI prediction models and evaluated their performance from different angles. Our main findings can be summarized under 3 items: (i) random splitting of the dataset into train and test folds leads to near-complete data memorization and produce highly over-optimistic results, as a result, it should be avoided; (ii) learned protein sequence embeddings works well in DTI prediction, even though no information related to protein structures, interactions or biochemical properties is utilized during the training of these models; and (iii) PCM models tends to learn from compound features and leave out protein features, mostly due to the natural bias in DTI data. We hope this study will aid researchers in designing robust and high-performing data-driven DTI prediction systems that have real-world translational value in drug discovery." @default.
- W4225415057 created "2022-05-05" @default.
- W4225415057 creator A5003981652 @default.
- W4225415057 creator A5054058101 @default.
- W4225415057 date "2022-05-01" @default.
- W4225415057 modified "2023-10-17" @default.
- W4225415057 title "How to Best Represent Proteins in Machine Learning-based Prediction of Drug/Compound-Target Interactions" @default.
- W4225415057 cites W1965092590 @default.
- W4225415057 cites W1982267716 @default.
- W4225415057 cites W1983478747 @default.
- W4225415057 cites W1988037271 @default.
- W4225415057 cites W1993711987 @default.
- W4225415057 cites W1996423252 @default.
- W4225415057 cites W2000671825 @default.
- W4225415057 cites W2007965730 @default.
- W4225415057 cites W2011301426 @default.
- W4225415057 cites W2019488416 @default.
- W4225415057 cites W2033882981 @default.
- W4225415057 cites W2042084565 @default.
- W4225415057 cites W2043338013 @default.
- W4225415057 cites W2060300932 @default.
- W4225415057 cites W2068819452 @default.
- W4225415057 cites W2070930739 @default.
- W4225415057 cites W2079745490 @default.
- W4225415057 cites W2083582778 @default.
- W4225415057 cites W2086286404 @default.
- W4225415057 cites W2093788819 @default.
- W4225415057 cites W2094403468 @default.
- W4225415057 cites W2100076566 @default.
- W4225415057 cites W2101133358 @default.
- W4225415057 cites W2102461176 @default.
- W4225415057 cites W2129434099 @default.
- W4225415057 cites W2131681506 @default.
- W4225415057 cites W2132292391 @default.
- W4225415057 cites W2142325821 @default.
- W4225415057 cites W2147863530 @default.
- W4225415057 cites W2148145769 @default.
- W4225415057 cites W2174991771 @default.
- W4225415057 cites W2268071782 @default.
- W4225415057 cites W2324589236 @default.
- W4225415057 cites W2411478524 @default.
- W4225415057 cites W2500660203 @default.
- W4225415057 cites W2531200345 @default.
- W4225415057 cites W2558999090 @default.
- W4225415057 cites W2594183968 @default.
- W4225415057 cites W2612935389 @default.
- W4225415057 cites W2740946158 @default.
- W4225415057 cites W2766155572 @default.
- W4225415057 cites W2785947426 @default.
- W4225415057 cites W2793168264 @default.
- W4225415057 cites W2806547269 @default.
- W4225415057 cites W2806953728 @default.
- W4225415057 cites W2807567459 @default.
- W4225415057 cites W2886544065 @default.
- W4225415057 cites W2887905891 @default.
- W4225415057 cites W2898402099 @default.
- W4225415057 cites W2952209560 @default.
- W4225415057 cites W2969180072 @default.
- W4225415057 cites W2980789587 @default.
- W4225415057 cites W2995514860 @default.
- W4225415057 cites W2999481648 @default.
- W4225415057 cites W2999554466 @default.
- W4225415057 cites W3004295003 @default.
- W4225415057 cites W3014805132 @default.
- W4225415057 cites W3024894813 @default.
- W4225415057 cites W3093397714 @default.
- W4225415057 cites W3120392574 @default.
- W4225415057 cites W3150635270 @default.
- W4225415057 cites W3176163250 @default.
- W4225415057 cites W3216192226 @default.
- W4225415057 cites W3217661426 @default.
- W4225415057 cites W4240082606 @default.
- W4225415057 cites W4249920046 @default.
- W4225415057 doi "https://doi.org/10.1101/2022.05.01.490207" @default.
- W4225415057 hasPublicationYear "2022" @default.
- W4225415057 type Work @default.
- W4225415057 citedByCount "0" @default.
- W4225415057 crossrefType "posted-content" @default.
- W4225415057 hasAuthorship W4225415057A5003981652 @default.
- W4225415057 hasAuthorship W4225415057A5054058101 @default.
- W4225415057 hasBestOaLocation W42254150571 @default.
- W4225415057 hasConcept C108583219 @default.
- W4225415057 hasConcept C116409475 @default.
- W4225415057 hasConcept C116834253 @default.
- W4225415057 hasConcept C119857082 @default.
- W4225415057 hasConcept C124101348 @default.
- W4225415057 hasConcept C13280743 @default.
- W4225415057 hasConcept C153180895 @default.
- W4225415057 hasConcept C154945302 @default.
- W4225415057 hasConcept C17744445 @default.
- W4225415057 hasConcept C185798385 @default.
- W4225415057 hasConcept C199539241 @default.
- W4225415057 hasConcept C205649164 @default.
- W4225415057 hasConcept C2776359362 @default.
- W4225415057 hasConcept C41008148 @default.
- W4225415057 hasConcept C50644808 @default.
- W4225415057 hasConcept C59822182 @default.
- W4225415057 hasConcept C86803240 @default.
- W4225415057 hasConcept C94625758 @default.
- W4225415057 hasConceptScore W4225415057C108583219 @default.