Matches in SemOpenAlex for { <https://semopenalex.org/work/W2109449939> ?p ?o ?g. }
- W2109449939 endingPage "2898" @default.
- W2109449939 startingPage "2884" @default.
- W2109449939 abstract "High-throughput structural proteomics is expected to generate considerable amounts of data on the progress of structure determination for many proteins. For each protein this includes information about cloning, expression, purification, biophysical characterization and structure determination via NMR spectroscopy or X-ray crystallography. It will be essential to develop specifications and ontologies for standardizing this information to make it amenable to retrospective analysis. To this end we created the SPINE database and analysis system for the Northeast Structural Genomics Consortium. SPINE, which is available at bioinfo.mbb.yale.edu/nesg or nesg.org, is specifically designed to enable distributed scientific collaboration via the Internet. It was designed not just as an information repository but as an active vehicle to standardize proteomics data in a form that would enable systematic data mining. The system features an intuitive user interface for interactive retrieval and modification of expression construct data, query forms designed to track global project progress and external links to many other resources. Currently the database contains experimental data on 985 constructs, of which 740 are drawn from Methanobacterium thermoautotrophicum, 123 from Saccharomyces cerevisiae, 93 from Caenorhabditis elegans and the remainder from other organisms. We developed a comprehensive set of data mining features for each protein, including several related to experimental progress (e.g. expression level, solubility and crystallization) and 42 based on the underlying protein sequence (e.g. amino acid composition, secondary structure and occurrence of low complexity regions). We demonstrate in detail the application of a particular machine learning approach, decision trees, to the tasks of predicting a protein’s solubility and propensity to crystallize based on sequence features. We are able to extract a number of key rules from our trees, in particular that soluble proteins tend to have significantly more acidic residues and fewer hydrophobic stretches than insoluble ones. One of the characteristics of proteomics data sets, currently and in the foreseeable future, is their intermediate size (∼500–5000 data points). This creates a number of issues in relation to error estimation. Initially we estimate the overall error in our trees based on standard cross-validation. However, this leaves out a significant fraction of the data in model construction and does not give error estimates on individual rules. Therefore, we present alternative methods to estimate the error in particular rules." @default.
- W2109449939 created "2016-06-24" @default.
- W2109449939 creator A5012117525 @default.
- W2109449939 creator A5014770642 @default.
- W2109449939 creator A5015710368 @default.
- W2109449939 creator A5020473435 @default.
- W2109449939 creator A5023094545 @default.
- W2109449939 creator A5042321575 @default.
- W2109449939 creator A5044542421 @default.
- W2109449939 creator A5052010181 @default.
- W2109449939 creator A5053246999 @default.
- W2109449939 creator A5058832587 @default.
- W2109449939 date "2001-07-01" @default.
- W2109449939 modified "2023-10-10" @default.
- W2109449939 title "SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics" @default.
- W2109449939 cites W1513332069 @default.
- W2109449939 cites W1539254844 @default.
- W2109449939 cites W1847611580 @default.
- W2109449939 cites W1977159876 @default.
- W2109449939 cites W1990453950 @default.
- W2109449939 cites W1997268601 @default.
- W2109449939 cites W2003144438 @default.
- W2109449939 cites W2004831316 @default.
- W2109449939 cites W2016620689 @default.
- W2109449939 cites W2042773270 @default.
- W2109449939 cites W2054774141 @default.
- W2109449939 cites W2085277871 @default.
- W2109449939 cites W2095450147 @default.
- W2109449939 cites W2103425480 @default.
- W2109449939 cites W2105254477 @default.
- W2109449939 cites W2108067237 @default.
- W2109449939 cites W2110041134 @default.
- W2109449939 cites W2110668908 @default.
- W2109449939 cites W2126742636 @default.
- W2109449939 cites W2130479394 @default.
- W2109449939 cites W2137786672 @default.
- W2109449939 cites W2142628689 @default.
- W2109449939 cites W2144112248 @default.
- W2109449939 cites W2155320925 @default.
- W2109449939 cites W2160013892 @default.
- W2109449939 cites W2162891889 @default.
- W2109449939 cites W2164082161 @default.
- W2109449939 cites W2164718466 @default.
- W2109449939 cites W2170652857 @default.
- W2109449939 cites W4213149192 @default.
- W2109449939 cites W4247964656 @default.
- W2109449939 cites W2030365715 @default.
- W2109449939 doi "https://doi.org/10.1093/nar/29.13.2884" @default.
- W2109449939 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/55760" @default.
- W2109449939 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/11433035" @default.
- W2109449939 hasPublicationYear "2001" @default.
- W2109449939 type Work @default.
- W2109449939 sameAs 2109449939 @default.
- W2109449939 citedByCount "113" @default.
- W2109449939 countsByYear W21094499392012 @default.
- W2109449939 countsByYear W21094499392013 @default.
- W2109449939 countsByYear W21094499392014 @default.
- W2109449939 countsByYear W21094499392015 @default.
- W2109449939 countsByYear W21094499392017 @default.
- W2109449939 countsByYear W21094499392018 @default.
- W2109449939 countsByYear W21094499392019 @default.
- W2109449939 countsByYear W21094499392021 @default.
- W2109449939 countsByYear W21094499392022 @default.
- W2109449939 crossrefType "journal-article" @default.
- W2109449939 hasAuthorship W2109449939A5012117525 @default.
- W2109449939 hasAuthorship W2109449939A5014770642 @default.
- W2109449939 hasAuthorship W2109449939A5015710368 @default.
- W2109449939 hasAuthorship W2109449939A5020473435 @default.
- W2109449939 hasAuthorship W2109449939A5023094545 @default.
- W2109449939 hasAuthorship W2109449939A5042321575 @default.
- W2109449939 hasAuthorship W2109449939A5044542421 @default.
- W2109449939 hasAuthorship W2109449939A5052010181 @default.
- W2109449939 hasAuthorship W2109449939A5053246999 @default.
- W2109449939 hasAuthorship W2109449939A5058832587 @default.
- W2109449939 hasBestOaLocation W21094499391 @default.
- W2109449939 hasConcept C104317684 @default.
- W2109449939 hasConcept C113843644 @default.
- W2109449939 hasConcept C124101348 @default.
- W2109449939 hasConcept C129307140 @default.
- W2109449939 hasConcept C136475424 @default.
- W2109449939 hasConcept C141231307 @default.
- W2109449939 hasConcept C157764524 @default.
- W2109449939 hasConcept C157915830 @default.
- W2109449939 hasConcept C173608175 @default.
- W2109449939 hasConcept C177264268 @default.
- W2109449939 hasConcept C189206191 @default.
- W2109449939 hasConcept C192772702 @default.
- W2109449939 hasConcept C199360897 @default.
- W2109449939 hasConcept C23123220 @default.
- W2109449939 hasConcept C2780801425 @default.
- W2109449939 hasConcept C41008148 @default.
- W2109449939 hasConcept C41584329 @default.
- W2109449939 hasConcept C46111723 @default.
- W2109449939 hasConcept C47701112 @default.
- W2109449939 hasConcept C55493867 @default.
- W2109449939 hasConcept C555944384 @default.
- W2109449939 hasConcept C60644358 @default.
- W2109449939 hasConcept C70721500 @default.