Matches in SemOpenAlex for { <https://semopenalex.org/work/W2007176869> ?p ?o ?g. }
- W2007176869 endingPage "670" @default.
- W2007176869 startingPage "652" @default.
- W2007176869 abstract "In mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to peptides for diverse reasons. These include deficiencies of the scoring schemes implemented in the database search tools, sequence variations (e.g. single nucleotide polymorphisms) or omissions in the database searched, post-translational or chemical modifications of the peptide analyzed, or the observation of sequences that are not anticipated from the genomic sequence (e.g. splice forms, somatic rearrangement, and processed proteins). To increase the amount of information that can be extracted from proteomic MS/MS datasets we developed a robust method that detects high quality spectra within the fraction of spectra unassigned by conventional sequence database searching and computes a quality score for each spectrum. We also demonstrate that iterative search strategies applied to such detected unassigned high quality spectra significantly increase the number of spectra that can be assigned from datasets and that biologically interesting new insights can be gained from existing data. In mass spectrometry-based proteomics, frequently hundreds of thousands of MS/MS spectra are collected in a single experiment. Of these, a relatively small fraction is confidently assigned to peptide sequences, whereas the majority of the spectra are not further analyzed. Spectra are not assigned to peptides for diverse reasons. These include deficiencies of the scoring schemes implemented in the database search tools, sequence variations (e.g. single nucleotide polymorphisms) or omissions in the database searched, post-translational or chemical modifications of the peptide analyzed, or the observation of sequences that are not anticipated from the genomic sequence (e.g. splice forms, somatic rearrangement, and processed proteins). To increase the amount of information that can be extracted from proteomic MS/MS datasets we developed a robust method that detects high quality spectra within the fraction of spectra unassigned by conventional sequence database searching and computes a quality score for each spectrum. We also demonstrate that iterative search strategies applied to such detected unassigned high quality spectra significantly increase the number of spectra that can be assigned from datasets and that biologically interesting new insights can be gained from existing data. Proteomics, the systematic identification and characterization of all proteins expressed in a cell, has become a key analytical approach in the life sciences (1Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5602) Google Scholar). The dramatic progress of proteomic research over the last decade has been catalyzed by several, seemingly independent developments. First, the wealth of genomic sequence information generated by large scale sequencing projects and the development of computational gene prediction and annotation tools have produced sequence databases that are expected to contain most coding gene regions. These databases can be searched with proteomic data and constrain the proteomic search space (2Apweiler R. Bairoch A. Wu C.H. Protein sequence databases.Curr. Opin. Chem. Biol. 2004; 8: 76-80Crossref PubMed Scopus (174) Google Scholar). Second, technological improvements in mass spectrometry and peptide and protein separation techniques allow rapid and sensitive protein identification from minute amounts of complex biological samples (for reviews, see Refs. 1Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5602) Google Scholar, 3Yates J.R. Mass spectral analysis in proteomics.Annu. Rev. Biophys. Biomol. Struct. 2004; 33: 297-316Crossref PubMed Scopus (254) Google Scholar, and 4Ferguson P.L. Smith R.D. Proteome analysis by mass spectrometry.Annu. Rev. Biophys. Biomol. Struct. 2003; 32: 399-424Crossref PubMed Scopus (124) Google Scholar). Third, the development of computational tools for the assignment of MS/MS spectra to peptide sequences and the statistical validation of these assignments support the consistent analysis of large datasets with no or minimal human intervention (5Nesvizhskii A.I. Aebersold R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS.Drug Discov. Today. 2004; 9: 173-181Crossref PubMed Scopus (151) Google Scholar). Collectively these developments resulted in the emergence of shotgun proteomics, a strategy based on the combination of tandem mass spectrometry-based peptide sequencing and sequence database searching, which now routinely permits the identification of hundreds to thousands of proteins in a single experiment. Shotgun proteomics creates significant computational challenges (5Nesvizhskii A.I. Aebersold R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS.Drug Discov. Today. 2004; 9: 173-181Crossref PubMed Scopus (151) Google Scholar, 6Patterson S.D. Data analysis—the Achilles heel of proteomics.Nat. Biotechnol. 2003; 21: 221-222Crossref PubMed Scopus (109) Google Scholar, 7Johnson R.S. Davis M.T. Taylor J.A. Patterson S.D. Informatics for protein identification by mass spectrometry.Methods. 2005; 35: 223-236Crossref PubMed Scopus (102) Google Scholar, 8Russell S.A. Old W. Resing K.A. Hunter L. Proteomic informatics.Int. Rev. Neurobiol. 2004; 61: 129-157Google Scholar). Large numbers (on the order of 105) of MS/MS spectra acquired in each experiment need to be computationally processed to identify peptides that produced them and to infer what proteins were present in the original sample. In most high throughput studies, peptide identification is performed by searching MS/MS spectra against protein sequence databases. A number of automated database search tools have been developed for that purpose, including commercial and open source programs (9Eng J.K. McCormack A.L. Yates J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5444) Google Scholar, 10Mann M. Wilm M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags.Anal. Chem. 1994; 66: 4390-4399Crossref PubMed Scopus (1317) Google Scholar, 11Perkins D.N. Pappin D.J.C. Creasy D.M. Cottrell J.C. Probability-based protein identification by searching sequence databases using mass spectrometry data.Electrophoresis. 1999; 20: 3551-3567Crossref PubMed Scopus (6776) Google Scholar, 12Clauser K.R. Baker P. Burlingame A.L. Role of accurate mass measurement (±10 ppm) in protein identification strategies employing MS or MS/MS and database searching.Anal. Chem. 1999; 71: 2871-2882Crossref PubMed Scopus (981) Google Scholar, 13Field H.I. Fenyo D. Beavis R.C. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimizes protein identification, and archives data in a relational database.Proteomics. 2002; 2: 36-47Crossref PubMed Scopus (192) Google Scholar, 14Zhang N. Aebersold R. Schwikowski B. ProbID: a probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data.Proteomics. 2002; 2: 1406-1412Crossref PubMed Scopus (191) Google Scholar, 15Craig R. Beavis R.C. TANDEM: matching proteins with tandem mass spectra.Bioinformatics. 2004; 20: 1466-1470Crossref PubMed Scopus (1991) Google Scholar, 16Geer L.Y. Markey S.P. Kowalak J.A. Wagner L. Xu M. Maynard D.M. Yang X. Shi W. Bryant S.H. Open mass spectrometry search algorithm.J. Proteome Res. 2004; 3: 958-964Crossref PubMed Scopus (1167) Google Scholar, 17Colinge J. Masselot A. Giron M. Dessingy T. Magnin J. OLAV: towards high-throughput tandem mass spectrometry data identification.Proteomics. 2003; 3: 1454-1463Crossref PubMed Scopus (268) Google Scholar). These programs correlate the experimental MS/MS spectra with theoretical fragmentation patterns of peptides obtained from a sequence database and use various scoring schemes to find the best matching peptide sequence. This high throughput protein identification process, however, is prone to false positives resulting from incorrect peptide assignments to MS/MS spectra by the database search tools (5Nesvizhskii A.I. Aebersold R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS.Drug Discov. Today. 2004; 9: 173-181Crossref PubMed Scopus (151) Google Scholar, 18Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3897) Google Scholar, 19Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; 75: 4646-4658Crossref PubMed Scopus (3635) Google Scholar, 20Baldwin M.A. Protein identification by mass spectrometry: issues to be considered.Mol. Cell. Proteomics. 2004; 3: 1-9Abstract Full Text Full Text PDF PubMed Scopus (169) Google Scholar, 21Carr S. Aebersold R. Baldwin M. Burlingame A. Clauser K. Nesvizhskii A. The need for guidelines in publication of peptide and protein identification data.Mol. Cell. Proteomics. 2004; 3: 531-533Abstract Full Text Full Text PDF PubMed Scopus (414) Google Scholar). The problem of false positives has received significant attention in recent years. As a result, statistical approaches and computational tools were developed for assigning confidence measures to peptide and protein identifications and for estimating the false identification rates. These tools reduce the need for time-consuming manual verification of peptide assignments and allow faster and more consistent analysis of large scale datasets (5Nesvizhskii A.I. Aebersold R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS.Drug Discov. Today. 2004; 9: 173-181Crossref PubMed Scopus (151) Google Scholar). Despite the progress in developing new database search tools and methods for statistical validation of peptide assignments to MS/MS spectra, the number of spectra that remain “unassigned” (i.e. the sequence of the peptide that produced the spectrum is not known) in any experiment is significant. In fact, of all MS/MS spectra acquired in a typical shotgun proteomic experiment only a relatively small fraction (e.g. typically less than a half in the case of the ion trap instruments) is assigned a peptide with high confidence. The use of higher quality instruments having better mass accuracy and resolution alleviates the problem but does not eliminate it. Reasons for such a high failure rate include the deficiencies of the scoring schemes used to quantify the degree of similarity between the experimental spectrum and those predicted for database peptides, ambiguities in the determination of the charge state of the peptide ions selected for fragmentation (in low mass accuracy instruments), the presence of spectra arising from non-peptidic contaminants, concurrent fragmentation of multiple different precursor ions, and the low quality of many spectra due to excessive noise or incomplete peptide fragmentation (5Nesvizhskii A.I. Aebersold R. Analysis, statistical validation and dissemination of large-scale proteomics datasets generated by tandem MS.Drug Discov. Today. 2004; 9: 173-181Crossref PubMed Scopus (151) Google Scholar, 7Johnson R.S. Davis M.T. Taylor J.A. Patterson S.D. Informatics for protein identification by mass spectrometry.Methods. 2005; 35: 223-236Crossref PubMed Scopus (102) Google Scholar, 22Resing K.A. Meyer-Arendt K. Mendoza A.M. Aveline-Wolf L.D. Jonscher K.R. Pierce K.G. Old W.M. Cheung H.T. Russell S. Wattawa J.L. Goehle G.R. Knight R.D. Ahn N.G. Improving reproducibility and sensitivity in identifying human proteins by shotgun proteomics.Anal. Chem. 2004; 76: 3556-3568Crossref PubMed Scopus (204) Google Scholar, 23Chalkley R.J. Baker P.R. Hansen K.C. Medzihradszky K.F. Allen N.P. Rexach M. Burlingame A.L. Comprehensive analysis of a multidimensional liquid chromatography mass spectrometry dataset acquired on a quadrupole selecting, quadrupole collision cell, time-of-flight mass spectrometer: I. How much of the data is theoretically interpretable by search engines?.Mol. Cell. Proteomics. 2005; 4: 1189-1193Abstract Full Text Full Text PDF PubMed Scopus (48) Google Scholar). However, a significant fraction of high quality, peptide-derived spectra also remain unassigned, and the failure of the database search tools to interpret them correctly cannot be explained by the reasons mentioned above. One source of unassigned high quality spectra are peptides containing a post-translational or chemical modification. To speed up the analysis, MS/MS database searching is often performed in a way that does not anticipate the presence of modifications in the peptides analyzed, and the modified peptides are therefore missed. Another source for such spectra are peptides whose sequence is not present in the searched protein sequence database, e.g. peptides corresponding to unanticipated alternative splice forms or sequence variants of known proteins (polymorphisms). Although identification of modified, mutated, or novel 1The term “novel” peptide here refers to a peptide identified by searching an unannotated genomic database and whose sequence is not present in any major protein sequence database for the corresponding organism. peptides can be of high significance, the high quality spectra that remain unassigned after the initial database search pass through the data in most high throughput experiments are not further analyzed. Finding such spectra would normally entail manually sifting through large amounts of low quality data, which is rarely practiced due to the ever increasing number of newly acquired datasets waiting to be processed and interpreted. Thus, there is a clear need to develop computational tools for automated spectrum quality assessment that could be used to detect unassigned high quality spectra and mark them for subsequent reanalysis. Such quality assessment tools can also be used for a different task, e.g. to filter out low quality spectra prior to database searching to reduce the computational time or to assist in the process of discriminating between correct and incorrect peptide assignments. In this work, we present a dynamic quality scoring approach for finding high quality unassigned spectra in large shotgun proteomic datasets. The basic idea behind the method is that in the first database search pass through the data high confidence peptide assignments are generally based on high quality MS/MS spectra. Therefore, the notion of what constitutes a high quality spectrum can be learned from the analyzed data itself, i.e. without relying on a training dataset created using spectra from a different experiment (24Nesvizhskii A.I. Vogelzang M. Aebersold R. Measuring MS/MS spectrum quality using a robust multivariate classifier.in: Proceedings of the 52nd American Society for Mass Spectrometry Conference on Mass Spectrometry and Allied Topics, Nashville, TN (May 23–27, 2004), Abstr. ThPA012. American Society for Mass Spectrometry, Santa Fe, NM2004Google Scholar). This way the statistical classifier is automatically developed for each dataset anew, ensuring the robustness of the method toward variations in the MS/MS spectrum properties caused by differences in acquisition methods or instrument to instrument variability. This distinguishes this method from other recently described approaches (25Moore R.E. Young M.K. Lee T.D. Method for screening peptide fragment ion mass spectra prior to database searching.J. Am. Soc. Mass Spectrom. 2000; 11: 422-426Crossref PubMed Scopus (53) Google Scholar, 27Bern M. Goldberg D. McDonald W.H. Yates III, J.R. Automatic quality assessment of peptide tandem mass spectra.Bioinformatics. 2004; 20: I49-I54Crossref PubMed Scopus (159) Google Scholar, 28Purvine S. Kolker N. Kolker E. Spectral quality assessment for high-throughput tandem mass spectrometry proteomics.OMICS. 2004; 8: 255-265Crossref PubMed Scopus (50) Google Scholar, 29Xu M. Geer L.Y. Bryant S.H. Roth J.S. Kowalak J.A. Maynard D.M. Markey S.P. Assessing data quality of peptide mass spectra obtained by quadrupole ion trap mass spectrometry.J. Proteome Res. 2005; 4: 300-305Crossref PubMed Scopus (32) Google Scholar) 2A. I. Nesvizhskii, unpublished. See also Ref. 18Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Crossref PubMed Scopus (3897) Google Scholar. that can produce inaccurate results when applied to spectra that differ significantly from those used for training. The statistical spectrum quality classifier is computed using an extended set of spectrum features, including those designed to take into account the knowledge of the peptide fragmentation process. The accuracy of the method is evaluated using a dataset of MS/MS spectra from a recent experiment on human lipid rafts (30Von Haller P.D. Yi E. Donohoe S. Vaughn K. Keller A. Nesvizhskii A.I. Eng J. Li X.J. Goodlett D.R. Aebersold R. Watts J.D. The application of new software tools to quantitative protein profiling via ICAT and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis and the application of statistical tools for data analysis and interpretation.Mol. Cell. Proteomics. 2003; 2: 428-442Abstract Full Text Full Text PDF PubMed Scopus (90) Google Scholar). The results indicate that the spectrum quality classifier enables fast and automated detection of high quality spectra left unassigned during the initial computational pass through the data. We also demonstrate that by interrogating those unassigned high quality spectra more comprehensively using existing protein sequence databases and by searching against genomic databases, one can significantly increase the number of identified peptides, including peptides containing modifications and sequence polymorphisms. Furthermore reanalysis of high quality spectra can lead to the identification of novel peptides, e.g. peptides confirming computationally predicted alternative splice forms. Three experimental datasets of MS/MS spectra were used to evaluate and optimize the quality scoring method and to investigate the sources of peptides that produced those spectra. All spectra were acquired using ESI ion trap tandem mass spectrometers. 1) The Haemophilus influenzae dataset consisted of 15 LC-MS/MS runs and was used previously as a test dataset for the development of a statistical model for validation of peptide assignments to MS/MS spectra (19Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; 75: 4646-4658Crossref PubMed Scopus (3635) Google Scholar). Sample proteins were digested using trypsin, and the resulting peptide mixtures were separated using reverse phase and strong cation exchange (SCX) 3The abbreviations used are: SCX, strong cation exchange; IPI, International Protein Index; SQS, spectrum quality score; LDF, linear discriminant function; ROC, receiver operator characteristic; EST, expressed sequence tag; SNP, single nucleotide polymorphism; LIME1, Lck-interacting transmembrane adaptor 1 protein. chromatography prior to MS/MS sequencing. The dataset contained more than 16,000 multiply charged MS/MS spectra generated from the membrane fraction of the sample. The spectra were searched using SEQUEST as described previously (19Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; 75: 4646-4658Crossref PubMed Scopus (3635) Google Scholar), and peptide assignments to the spectra were processed using PeptideProphet. Of the 31,728 search results (multiply charged spectra were searched twice, assuming 2+ or 3+ charge state, because the exact charge could not be determined), 4229 MS/MS spectra were assigned a peptide with probability above 0.9 (3457 and 730 assignments to spectra of doubly charged and triply charged precursor ions, respectively). This dataset was used to optimize the method for computing spectrum quality scores and to test the accuracy of the method as a function of the dataset size. 2) The Arabidopsis dataset 4F. F. Roos, J. Grossmann, W. Gruissem, and S. Baginsky, unpublished data. was acquired from four protein mixtures derived from cultured Arabidopsis thaliana cells. The crude extract was loaded on a one-dimensional SDS gel, and single bands were cut out. Each band was digested with trypsin, and peptides were sequenced by LC-MS/MS. In total, 3420 MS/MS spectra were submitted to a SEQUEST database search against a protein database of Arabidopsis and known contaminants allowing for semitryptic (tryptic at one terminus only) peptides with a mass tolerance of 3 Da and with specifying carboxyamidomethylation of cysteines and methionine oxidation as variable modifications. This search resulted in 6720 peptide assignments (counting 2+/3+ duplicates). Of those, 924 were peptide assignments to spectra with a probability of being correct greater than 0.90 as computed by PeptideProphet. Additional database searches were performed using this dataset to test the ability of the method to recover high quality unassigned spectra. 3) The human lipid raft dataset was taken from a large scale quantitative proteomic experiment on lipid raft plasma membrane domains from human Jurkat T cells (30Von Haller P.D. Yi E. Donohoe S. Vaughn K. Keller A. Nesvizhskii A.I. Eng J. Li X.J. Goodlett D.R. Aebersold R. Watts J.D. The application of new software tools to quantitative protein profiling via ICAT and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis and the application of statistical tools for data analysis and interpretation.Mol. Cell. Proteomics. 2003; 2: 428-442Abstract Full Text Full Text PDF PubMed Scopus (90) Google Scholar). The full dataset is available from the PeptideAtlas MS/MS data repository 5www.peptideatlas.org. (31Desiere F. Deutsch E.W. Nesvizhskii A.I. Mallick P. King N.L. Eng J.K. Aderem A. Boyle R. Brunner E. Donohoe S. Fausto N. Hafen E. Hood L. Katze M.G. Kennedy K.A. Kregenow F. Lee H. Lin B. Martin D. Ranish J.A. Rawlings D.J. Samelson L.E. Shiio Y. Watts J.D. Wollscheid B. Wright M.E. Yan W. Yang L. Yi E.C. Zhang H. Aebersold R. Integration with the human genome of peptide sequences obtained by high-throughput mass spectrometry.Genome Biol. 2005; 6: R5Crossref PubMed Google Scholar). Nine LC/MS/MS runs (flow-through sample, SCX fractions 31 through 39) were selected for the analysis in this work. The spectra, 12,864 in total, were first searched using SEQUEST against the human International Protein Index (IPI) database (32Kersey P.J. Duarte J. Williams A. Karavidopoulou Y. Birney E. Apweiler R. The International Protein Index: an integrated database for proteomics experiments.Proteomics. 2004; 4: 1985-1988Crossref PubMed Scopus (640) Google Scholar), version 2.35, allowing for semitryptic peptides with a mass tolerance of 3 Da and allowing for methionine oxidation as a variable modification. This resulted in 24,197 peptide assignments, counting 2+/3+ duplicates (711 singly charged, 12,153 doubly charged, and 12,044 triply charged). Of those, 4171 were peptide assignments to spectra with a PeptideProphet probability of being correct equal or greater than 0.9 (4034 peptide assignments to 2+/3+ spectra and 137 assignments to 1+ spectra). The results were further analyzed using ProteinProphet (19Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; 75: 4646-4658Crossref PubMed Scopus (3635) Google Scholar), resulting in 315 proteins having a ProteinProphet probability above 0.9 identified by 660 unique peptides. The number of proteins was determined by counting the number of entries in the minimal list of proteins sufficient to explain all observed peptides (19Nesvizhskii A.I. Keller A. Kolker E. Aebersold R. A statistical model for identifying proteins by tandem mass spectrometry.Anal. Chem. 2003; 75: 4646-4658Crossref PubMed Scopus (3635) Google Scholar). The list of peptide assignments and the corresponding list of protein accession numbers are provided in Supplemental Table S1 (the results of the analysis of the entire dataset, including ICAT samples, can be found on the PeptideAtlas website). In calculating the number of assigned spectra, a spectrum was counted only if it was assigned a peptide corresponding to a protein with ProteinProphet probability greater than 0.9. To eliminate random matches to proteins correctly identified by other peptides, an additional peptide level constraint was applied: a spectrum was counted as assigned only if the peptide was identified (from this or another spectrum in the dataset) with PeptideProphet probability equal to or greater than 0.5. To determine what types of peptides generated the high quality spectra unassigned in the initial search, a subset of the initially unassigned MS/MS spectra was reanalyzed using a number of additional searches. 1) “Large mass tolerance” search: SEQUEST, semitryptic, 5-Da mass tolerance (larger mass tolerance compared with the initial 3-Da search). 2) “4+/5+ charge state” search: SEQUEST search, semitryptic, 3-Da mass tolerance, assuming 4+ or 5+ charge state. 3) “Pyro-Glu” search: SEQUEST, semitryptic, 3-Da mass tolerance, allowing for conversion of N-terminal glutamine to pyroglutamic acid (loss of 17 Da) as a variable modification. Due to a limitation of SEQUEST, the search was performed allowing for modified residues to be located anywhere in the sequence; peptides that did not contain glutamine at the N terminus were filtered out at the data validation stage. 4) “Mascot” searches: Mascot, tryptic peptides only (note that the Mascot tryptic search allows for removal of the initiating Met), two missed cleavages or less, 3-Da mass tolerance, allowing for N-terminal acetylation (+42 Da) or carbamylation (+43 Da) as a variable modification. 5) Miscellaneous searches: X! Tandem (13Field H.I. Fenyo D. Beavis R.C. RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimizes protein identification, and archives data in a relational database.Proteomics. 2002; 2: 36-47Crossref PubMed Scopus (192) Google Scholar) search allowing for more than one type of modification per peptide; SEQUEST and Mascot searches allowing for modifications not specified in the previous searches (e.g. conversion of N-terminal glutamic acid residues to pyroglutamic acid, phosphorylation, guanidination, etc.). These searches were performed against the same version of the IPI database as the initial search (version 2.35). The results from different searches were processed using Interact and PeptideProphet and then manually scrutinized. Validated peptide sequences were imported in a relational structured query language database, and the total number of assigned spectra, the number of unique peptides, and the minimum number of protein identifications sufficient to explain all identified peptides were calculated. In addition, the spectra were searched against a genomic database as described under “Results.” Prior to computing spectrum features, two peak list subsets are extracted from each spectrum. The main reason for this step is a reduction of the number of noise peaks in the spectrum whose presence can lower the discriminating power of some spectral features. The two subsets of peaks are extracted for each spectrum in parallel using the following approaches. 1) A signal intensity cutoff is applied using a robust percentile-based approach, creating the “high intensity” peak subset. To account for the significant variability of the fragment ion intensities across the spectrum, the spectrum is divided into five equally sized m/z sections. Within each section, the peaks are sorted according to their intensity, and the intensity at a given percentile is used as the cutoff value (i.e. the signal intensity at the 50th percentile would be equal to the median intensity of the peaks in a section). All peaks with intensity above the cutoff are assigned to the high intensity peak subset. Depending on the instrument type and settings, MS/MS spectra may contain a large number of noise peaks. To make the percentile approach more robust, the number of peaks taken into account is restricted to the 400 most intense peaks per 1000-Da interval. 2) Spectrum deisotoping is performed based on a Poisson distribution model, thus creating the “deisotoped” peak subset. The peak heights of the isotope patterns are predicted by a Poisson distribution model with fragment masses, and the “heavy isotope excess” constant is estimated by dividing the sum of the average masses of all amino acids by the sum of the monoisotopic masses. The theoretical distribution and the actual spectrum are compared, and the χ2 test statistic is calculated to measure the quality of the fit. All peaks with the χ2 value below 20 are assigned to the deisotoped peak list subset. This represents a relatively loose threshold appropriate for low mass resolution data where the observed isotope distributions of heavy ions tend to deviate substantially fro" @default.
- W2007176869 created "2016-06-24" @default.
- W2007176869 creator A5004700665 @default.
- W2007176869 creator A5006955632 @default.
- W2007176869 creator A5007337114 @default.
- W2007176869 creator A5027765355 @default.
- W2007176869 creator A5053361384 @default.
- W2007176869 creator A5062948403 @default.
- W2007176869 creator A5085497637 @default.
- W2007176869 creator A5090561267 @default.
- W2007176869 date "2006-04-01" @default.
- W2007176869 modified "2023-10-16" @default.
- W2007176869 title "Dynamic Spectrum Quality Assessment and Iterative Computational Analysis of Shotgun Proteomic Data" @default.
- W2007176869 cites W1964443249 @default.
- W2007176869 cites W1971579210 @default.
- W2007176869 cites W1971887998 @default.
- W2007176869 cites W1981593008 @default.
- W2007176869 cites W1988265548 @default.
- W2007176869 cites W1995693931 @default.
- W2007176869 cites W1997149146 @default.
- W2007176869 cites W2003352316 @default.
- W2007176869 cites W2003463139 @default.
- W2007176869 cites W2005716310 @default.
- W2007176869 cites W2008317045 @default.
- W2007176869 cites W2015556811 @default.
- W2007176869 cites W2016605939 @default.
- W2007176869 cites W2023096047 @default.
- W2007176869 cites W2024141877 @default.
- W2007176869 cites W2025554716 @default.
- W2007176869 cites W2026465178 @default.
- W2007176869 cites W2030598149 @default.
- W2007176869 cites W2033745703 @default.
- W2007176869 cites W2034645824 @default.
- W2007176869 cites W2035283832 @default.
- W2007176869 cites W2036466798 @default.
- W2007176869 cites W2039998576 @default.
- W2007176869 cites W2043285998 @default.
- W2007176869 cites W2044543519 @default.
- W2007176869 cites W2044917750 @default.
- W2007176869 cites W2045592818 @default.
- W2007176869 cites W2045985536 @default.
- W2007176869 cites W2047275456 @default.
- W2007176869 cites W2048552782 @default.
- W2007176869 cites W2051795211 @default.
- W2007176869 cites W2058135122 @default.
- W2007176869 cites W2060318638 @default.
- W2007176869 cites W2060494738 @default.
- W2007176869 cites W2068781400 @default.
- W2007176869 cites W2075837782 @default.
- W2007176869 cites W2077000131 @default.
- W2007176869 cites W2083870231 @default.
- W2007176869 cites W2093620473 @default.
- W2007176869 cites W2093872018 @default.
- W2007176869 cites W2095898845 @default.
- W2007176869 cites W2107700010 @default.
- W2007176869 cites W2110603815 @default.
- W2007176869 cites W2112078820 @default.
- W2007176869 cites W2112456433 @default.
- W2007176869 cites W2113054326 @default.
- W2007176869 cites W2113945996 @default.
- W2007176869 cites W2130706354 @default.
- W2007176869 cites W2135337307 @default.
- W2007176869 cites W2137072570 @default.
- W2007176869 cites W2138754193 @default.
- W2007176869 cites W2145361425 @default.
- W2007176869 cites W2148632411 @default.
- W2007176869 cites W2149907293 @default.
- W2007176869 cites W2153181113 @default.
- W2007176869 cites W2155443196 @default.
- W2007176869 cites W2161189461 @default.
- W2007176869 cites W2171480972 @default.
- W2007176869 cites W3136323740 @default.
- W2007176869 cites W4236715020 @default.
- W2007176869 cites W4238323524 @default.
- W2007176869 cites W4371640544 @default.
- W2007176869 doi "https://doi.org/10.1074/mcp.m500319-mcp200" @default.
- W2007176869 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/16352522" @default.
- W2007176869 hasPublicationYear "2006" @default.
- W2007176869 type Work @default.
- W2007176869 sameAs 2007176869 @default.
- W2007176869 citedByCount "176" @default.
- W2007176869 countsByYear W20071768692012 @default.
- W2007176869 countsByYear W20071768692013 @default.
- W2007176869 countsByYear W20071768692014 @default.
- W2007176869 countsByYear W20071768692015 @default.
- W2007176869 countsByYear W20071768692016 @default.
- W2007176869 countsByYear W20071768692017 @default.
- W2007176869 countsByYear W20071768692018 @default.
- W2007176869 countsByYear W20071768692019 @default.
- W2007176869 countsByYear W20071768692020 @default.
- W2007176869 countsByYear W20071768692021 @default.
- W2007176869 countsByYear W20071768692022 @default.
- W2007176869 countsByYear W20071768692023 @default.
- W2007176869 crossrefType "journal-article" @default.
- W2007176869 hasAuthorship W2007176869A5004700665 @default.
- W2007176869 hasAuthorship W2007176869A5006955632 @default.
- W2007176869 hasAuthorship W2007176869A5007337114 @default.
- W2007176869 hasAuthorship W2007176869A5027765355 @default.