Matches in SemOpenAlex for { <https://semopenalex.org/work/W3132887202> ?p ?o ?g. }
Showing items 1 to 51 of
51
with 100 items per page.
- W3132887202 abstract "Proteins are molecular machines playing almost every fundamental role in activities of life. Their biological functions are mostly driven through conformational transitions and interaction interfaces with other bio-molecules such as DNA sequences, proteins and other ligands. In quest of the mechanism underlying protein functions, I conducted two projects aiming, firstly, to explore the structural change of proteins via identifying their rigid bodies, and secondly, to devise new sequence-based features to predict DNA-binding sites in proteins. Despite many previous efforts to calculate rigid domains in proteins, it is still highly desirable to develop new segmentation algorithms which are able to efficiently segment high-throughput of proteins, meanwhile to avoid protein-dependent parameters tuning such as the number of rigid domains. Thus, I introduce a new rigid domain segmentation method where I use a graph whose vertices are amino acids to represent multiple conformational states of a protein. This graph is later reduced by a coarse graining such as the Louvain clustering algorithm. Afterward, the domain-wise relationships among clusters in the reduced graph were inferred through a binary labeling of its edges which becomes feasible thanks to the line graph transformation and generalized Viterbi algorithm. Because of the binary labeling, our method does not require the number of rigid domains as an input parameter like other existing methods. I validate our graph-based method on 487 examples from DynDom database and compare our segments with other methods on several proteins whose structural changes range from medium to large and their molecular motions have been studied extensively in the literature. The algorithm code as well as usage instruction is available at https://github.com/dtklinh/GBRDE. In the second project, the identification of DNA-binding sites in proteins could be obtained either through structure- or sequence-based approaches. In spite of obtaining good results, structure-based methods require protein 3D structures which are expensive and time-consuming. In contrast, the sequence-based ones are efficiently applicable to entire protein databases, yet demand carefully designed features. Thus, I present a new information theoretic feature extracted from the Jensen–Shannon Divergence (JSD) where I harvest the differences between amino acids distributions of binding and non-binding sites. For the evaluation, I ran a five-fold cross validation on 263 proteins with Random Forest (RF) classifier along with features comprising of our new sequence-based feature and several popular ones such as position-specific scoring matrix (PSSM), orthogonal binary vector (OBV), and secondary structure (SS). The results show that by concatenating our features, there is a significant improvement of RF classifier performance in terms of sensitivity and Matthews correlation coefficient (MCC)." @default.
- W3132887202 created "2021-03-01" @default.
- W3132887202 creator A5034926698 @default.
- W3132887202 date "2022-02-21" @default.
- W3132887202 modified "2023-09-24" @default.
- W3132887202 title "Probabilistic Models to Detect Important Sites in Proteins" @default.
- W3132887202 doi "https://doi.org/10.53846/goediss-8461" @default.
- W3132887202 hasPublicationYear "2022" @default.
- W3132887202 type Work @default.
- W3132887202 sameAs 3132887202 @default.
- W3132887202 citedByCount "0" @default.
- W3132887202 crossrefType "dissertation" @default.
- W3132887202 hasAuthorship W3132887202A5034926698 @default.
- W3132887202 hasBestOaLocation W31328872021 @default.
- W3132887202 hasConcept C11413529 @default.
- W3132887202 hasConcept C132525143 @default.
- W3132887202 hasConcept C134306372 @default.
- W3132887202 hasConcept C154945302 @default.
- W3132887202 hasConcept C33923547 @default.
- W3132887202 hasConcept C36503486 @default.
- W3132887202 hasConcept C41008148 @default.
- W3132887202 hasConcept C49937458 @default.
- W3132887202 hasConcept C73555534 @default.
- W3132887202 hasConcept C80444323 @default.
- W3132887202 hasConceptScore W3132887202C11413529 @default.
- W3132887202 hasConceptScore W3132887202C132525143 @default.
- W3132887202 hasConceptScore W3132887202C134306372 @default.
- W3132887202 hasConceptScore W3132887202C154945302 @default.
- W3132887202 hasConceptScore W3132887202C33923547 @default.
- W3132887202 hasConceptScore W3132887202C36503486 @default.
- W3132887202 hasConceptScore W3132887202C41008148 @default.
- W3132887202 hasConceptScore W3132887202C49937458 @default.
- W3132887202 hasConceptScore W3132887202C73555534 @default.
- W3132887202 hasConceptScore W3132887202C80444323 @default.
- W3132887202 hasLocation W31328872021 @default.
- W3132887202 hasOpenAccess W3132887202 @default.
- W3132887202 hasPrimaryLocation W31328872021 @default.
- W3132887202 hasRelatedWork W11688267 @default.
- W3132887202 hasRelatedWork W11694923 @default.
- W3132887202 hasRelatedWork W17508780 @default.
- W3132887202 hasRelatedWork W2440223 @default.
- W3132887202 hasRelatedWork W4109773 @default.
- W3132887202 hasRelatedWork W6347445 @default.
- W3132887202 hasRelatedWork W6380806 @default.
- W3132887202 hasRelatedWork W8447479 @default.
- W3132887202 hasRelatedWork W9886305 @default.
- W3132887202 hasRelatedWork W9547927 @default.
- W3132887202 isParatext "false" @default.
- W3132887202 isRetracted "false" @default.
- W3132887202 magId "3132887202" @default.
- W3132887202 workType "dissertation" @default.