Matches in SemOpenAlex for { <https://semopenalex.org/work/W2073886101> ?p ?o ?g. }
Showing items 1 to 93 of
93
with 100 items per page.
- W2073886101 endingPage "467" @default.
- W2073886101 startingPage "461" @default.
- W2073886101 abstract "Successful genome mining is dependent on accurate prediction of protein function from sequence. This often involves dividing protein families into functional subtypes (e.g., with different substrates). In many cases, there are only a small number of known functional subtypes, but in the case of the adenylation domains of nonribosomal peptide synthetases (NRPS), there are >500 known substrates. Latent semantic indexing (LSI) was originally developed for text processing but has also been used to assign proteins to families. Proteins are treated as ''documents'' and it is necessary to encode properties of the amino acid sequence as ''terms'' in order to construct a term-document matrix, which counts the terms in each document. This matrix is then processed to produce a document-concept matrix, where each protein is represented as a row vector. A standard measure of the closeness of vectors to each other (cosines of the angle between them) provides a measure of protein similarity. Previous work encoded proteins as oligopeptide terms, i.e. counted oligopeptides, but used no information regarding location of oligopeptides in the proteins. A novel tokenization method was developed to analyze information from multiple alignments. LSI successfully distinguished between two functional subtypes in five well-characterized families. Visualization of different ''concept'' dimensions allows exploration of the structure of protein families. LSI was also used to predict the amino acid substrate of adenylation domains of NRPS. Better results were obtained when selected residues from multiple alignments were used rather than the total sequence of the adenylation domains. Using ten residues from the substrate binding pocket performed better than using 34 residues within 8 Å of the active site. Prediction efficiency was somewhat better than that of the best published method using a support vector machine." @default.
- W2073886101 created "2016-06-24" @default.
- W2073886101 creator A5013443677 @default.
- W2073886101 creator A5019425779 @default.
- W2073886101 creator A5045920121 @default.
- W2073886101 creator A5046165463 @default.
- W2073886101 creator A5060612718 @default.
- W2073886101 creator A5064728359 @default.
- W2073886101 creator A5064874095 @default.
- W2073886101 creator A5071228024 @default.
- W2073886101 date "2014-02-01" @default.
- W2073886101 modified "2023-10-18" @default.
- W2073886101 title "Predicting substrate specificity of adenylation domains of nonribosomal peptide synthetases and other protein properties by latent semantic indexing" @default.
- W2073886101 cites W1985381418 @default.
- W2073886101 cites W1996112765 @default.
- W2073886101 cites W2062185057 @default.
- W2073886101 cites W2070492505 @default.
- W2073886101 cites W2080895748 @default.
- W2073886101 cites W2098165162 @default.
- W2073886101 cites W2114839447 @default.
- W2073886101 cites W2131887397 @default.
- W2073886101 cites W2137015675 @default.
- W2073886101 cites W2139919097 @default.
- W2073886101 cites W2142678478 @default.
- W2073886101 cites W2156202011 @default.
- W2073886101 doi "https://doi.org/10.1007/s10295-013-1322-2" @default.
- W2073886101 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/24104398" @default.
- W2073886101 hasPublicationYear "2014" @default.
- W2073886101 type Work @default.
- W2073886101 sameAs 2073886101 @default.
- W2073886101 citedByCount "33" @default.
- W2073886101 countsByYear W20738861012014 @default.
- W2073886101 countsByYear W20738861012015 @default.
- W2073886101 countsByYear W20738861012016 @default.
- W2073886101 countsByYear W20738861012017 @default.
- W2073886101 countsByYear W20738861012018 @default.
- W2073886101 countsByYear W20738861012019 @default.
- W2073886101 countsByYear W20738861012020 @default.
- W2073886101 countsByYear W20738861012021 @default.
- W2073886101 countsByYear W20738861012022 @default.
- W2073886101 countsByYear W20738861012023 @default.
- W2073886101 crossrefType "journal-article" @default.
- W2073886101 hasAuthorship W2073886101A5013443677 @default.
- W2073886101 hasAuthorship W2073886101A5019425779 @default.
- W2073886101 hasAuthorship W2073886101A5045920121 @default.
- W2073886101 hasAuthorship W2073886101A5046165463 @default.
- W2073886101 hasAuthorship W2073886101A5060612718 @default.
- W2073886101 hasAuthorship W2073886101A5064728359 @default.
- W2073886101 hasAuthorship W2073886101A5064874095 @default.
- W2073886101 hasAuthorship W2073886101A5071228024 @default.
- W2073886101 hasBestOaLocation W20738861011 @default.
- W2073886101 hasConcept C104317684 @default.
- W2073886101 hasConcept C154945302 @default.
- W2073886101 hasConcept C160403918 @default.
- W2073886101 hasConcept C170133592 @default.
- W2073886101 hasConcept C2777379556 @default.
- W2073886101 hasConcept C41008148 @default.
- W2073886101 hasConcept C54355233 @default.
- W2073886101 hasConcept C553450214 @default.
- W2073886101 hasConcept C70721500 @default.
- W2073886101 hasConcept C86803240 @default.
- W2073886101 hasConceptScore W2073886101C104317684 @default.
- W2073886101 hasConceptScore W2073886101C154945302 @default.
- W2073886101 hasConceptScore W2073886101C160403918 @default.
- W2073886101 hasConceptScore W2073886101C170133592 @default.
- W2073886101 hasConceptScore W2073886101C2777379556 @default.
- W2073886101 hasConceptScore W2073886101C41008148 @default.
- W2073886101 hasConceptScore W2073886101C54355233 @default.
- W2073886101 hasConceptScore W2073886101C553450214 @default.
- W2073886101 hasConceptScore W2073886101C70721500 @default.
- W2073886101 hasConceptScore W2073886101C86803240 @default.
- W2073886101 hasIssue "2" @default.
- W2073886101 hasLocation W20738861011 @default.
- W2073886101 hasLocation W20738861012 @default.
- W2073886101 hasOpenAccess W2073886101 @default.
- W2073886101 hasPrimaryLocation W20738861011 @default.
- W2073886101 hasRelatedWork W1977641695 @default.
- W2073886101 hasRelatedWork W2023650832 @default.
- W2073886101 hasRelatedWork W2064612440 @default.
- W2073886101 hasRelatedWork W2073886101 @default.
- W2073886101 hasRelatedWork W2111520150 @default.
- W2073886101 hasRelatedWork W2422809474 @default.
- W2073886101 hasRelatedWork W2791410839 @default.
- W2073886101 hasRelatedWork W2808584132 @default.
- W2073886101 hasRelatedWork W2895414000 @default.
- W2073886101 hasRelatedWork W4286698801 @default.
- W2073886101 hasVolume "41" @default.
- W2073886101 isParatext "false" @default.
- W2073886101 isRetracted "false" @default.
- W2073886101 magId "2073886101" @default.
- W2073886101 workType "article" @default.