Matches in SemOpenAlex for { <https://semopenalex.org/work/W2104394966> ?p ?o ?g. }
- W2104394966 endingPage "2718" @default.
- W2104394966 startingPage "2704" @default.
- W2104394966 abstract "Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomics provides a wealth of information about proteins present in biological samples. In bottom-up LC-MS/MS-based proteomics, proteins are enzymatically digested into peptides prior to query by LC-MS/MS. Thus, the information directly available from the LC-MS/MS data is at the peptide level. If a protein-level analysis is desired, the peptide-level information must be rolled up into protein-level information. We propose a principal component analysis-based statistical method, ProPCA, for efficiently estimating relative protein abundance from bottom-up label-free LC-MS/MS data that incorporates both spectral count information and LC-MS peptide ion peak attributes, such as peak area, volume, or height. ProPCA may be used effectively with a variety of quantification platforms and is easily implemented. We show that ProPCA outperformed existing quantitative methods for peptide-protein roll-up, including spectral counting methods and other methods for combining LC-MS peptide peak attributes. The performance of ProPCA was validated using a data set derived from the LC-MS/MS analysis of a mixture of protein standards (the UPS2 proteomic dynamic range standard introduced by The Association of Biomolecular Resource Facilities Proteomics Standards Research Group in 2006). Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total cell lysates prepared for LC-MS/MS analysis by alternative lysis methods and show that ProPCA identified more differentially abundant proteins than competing methods. Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomics provides a wealth of information about proteins present in biological samples. In bottom-up LC-MS/MS-based proteomics, proteins are enzymatically digested into peptides prior to query by LC-MS/MS. Thus, the information directly available from the LC-MS/MS data is at the peptide level. If a protein-level analysis is desired, the peptide-level information must be rolled up into protein-level information. We propose a principal component analysis-based statistical method, ProPCA, for efficiently estimating relative protein abundance from bottom-up label-free LC-MS/MS data that incorporates both spectral count information and LC-MS peptide ion peak attributes, such as peak area, volume, or height. ProPCA may be used effectively with a variety of quantification platforms and is easily implemented. We show that ProPCA outperformed existing quantitative methods for peptide-protein roll-up, including spectral counting methods and other methods for combining LC-MS peptide peak attributes. The performance of ProPCA was validated using a data set derived from the LC-MS/MS analysis of a mixture of protein standards (the UPS2 proteomic dynamic range standard introduced by The Association of Biomolecular Resource Facilities Proteomics Standards Research Group in 2006). Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total cell lysates prepared for LC-MS/MS analysis by alternative lysis methods and show that ProPCA identified more differentially abundant proteins than competing methods. One of the fundamental goals of proteomics methods for the biological sciences is to identify and quantify all proteins present in a sample. LC-MS/MS-based proteomics methodologies offer a promising approach to this problem (1.Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Crossref PubMed Scopus (5484) Google Scholar, 2.Domon B. Aebersold R. Mass spectrometry and protein analysis.Science. 2006; 312: 212-217Crossref PubMed Scopus (1559) Google Scholar, 3.Cravatt B.F. Simon G.M. Yates 3rd, J.R. The biological impact of mass-spectrometry-based proteomics.Nature. 2007; 450: 991-1000Crossref PubMed Scopus (554) Google Scholar). These methodologies allow for the acquisition of a vast amount of information about the proteins present in a sample. However, extracting reliable protein abundance information from LC-MS/MS data remains challenging. In this work, we were primarily concerned with the analysis of data acquired using bottom-up label-free LC-MS/MS-based proteomics techniques where “bottom-up” refers to the fact that proteins are enzymatically digested into peptides prior to query by the LC-MS/MS instrument platform (4.Kelleher N. Lin H. Valaskovic G. Aaserud D. Fridriksson E. McLafferty F. Top down versus bottom up protein characterization by tandem high-resolution mass spectrometry.J. Am. Chem. Soc. 1999; 121: 806-812Crossref Scopus (497) Google Scholar), and “label-free” indicates that analyses are performed without the aid of stable isotope labels. One challenge inherent in the bottom-up approach to proteomics is that information directly available from the LC-MS/MS data is at the peptide level. When a protein-level analysis is desired, as is often the case with discovery-driven LC-MS research, peptide-level information must be rolled up into protein-level information. Spectral counting (5.Liu H. Sadygov R.G. Yates 3rd, J.R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics.Anal. Chem. 2004; 76: 4193-4201Crossref PubMed Scopus (2043) Google Scholar, 6.Ishihama Y. Oda Y. Tabata T. Sato T. Nagasu T. Rappsilber J. Mann M. Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.Mol. Cell. Proteomics. 2005; 4: 1265-1272Abstract Full Text Full Text PDF PubMed Scopus (1596) Google Scholar, 7.Lu P. Vogel C. Wang R. Yao X. Marcotte E.M. Absolute protein expression profiling estimates the relative contributions of transcriptional and translational regulation.Nat. Biotechnol. 2007; 25: 117-124Crossref PubMed Scopus (903) Google Scholar, 8.Schmidt M.W. Houseman A. Ivanov A.R. Wolf D.A. Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe.Mol. Syst. Biol. 2007; 3: 79Crossref PubMed Scopus (97) Google Scholar, 9.Choi H. Fermin D. Nesvizhskii A.I. Significance analysis of spectral count data in label-free shotgun proteomics.Mol. Cell. Proteomics. 2008; 7: 2373-2385Abstract Full Text Full Text PDF PubMed Scopus (284) Google Scholar, 10.Zybailov B. Coleman M.K. Florens L. Washburn M.P. Correlation of relative abundance ratios derived from peptide ion chromatograms and spectrum counting for quantitative proteomic analysis using stable isotope labeling.Anal. Chem. 2005; 77: 6218-6224Crossref PubMed Scopus (299) Google Scholar) is a straightforward and widely used example of peptide-protein roll-up for LC-MS/MS data. Information experimentally acquired in single stage (MS) and tandem (MS/MS) spectra may lead to the assignment of MS/MS spectra to peptide sequences in a database-driven or database-free manner using various peptide identification software platforms (SEQUEST (11.Eng J. McCormack A. Yates 3rd, J.R. An approach to correlate tandem mass spectra data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Crossref PubMed Scopus (5315) Google Scholar) and Mascot (12.Pappin D.J. Hojrup P. Bleasby A.J. Rapid identification of proteins by peptide-mass fingerprinting.Curr. Biol. 1993; 3: 327-332Abstract Full Text PDF PubMed Scopus (1407) Google Scholar), for instance); the identified peptide sequences correspond, in turn, to proteins. In principle, the number of tandem spectra matched to peptides corresponding to a certain protein, the spectral count (SC), 1The abbreviations used are:SCspectral countAMTaccurate mass and timeFDRfalse discovery rateHFIP1,1,1,3,3,3-hexafluoro-2-propanol (heptafluoroisopropanol)PCAprincipal component analysisPPApeptide peak attributeProALTalternative peptide-protein roll-up procedureProPCAPCA-based peptide-protein roll-up procedureTCEPtris(2-carboxyethyl)phosphine hydrochlorideGOgene ontology. is positively associated with the abundance of a protein (5.Liu H. Sadygov R.G. Yates 3rd, J.R. A model for random sampling and estimation of relative protein abundance in shotgun proteomics.Anal. Chem. 2004; 76: 4193-4201Crossref PubMed Scopus (2043) Google Scholar). In spectral counting techniques, raw or normalized SCs are used as a surrogate for protein abundance. Spectral counting methods have been moderately successful in quantifying protein abundance and identifying significant proteins in various settings. However, SC-based methods do not make full use of information available from peaks in the LC-MS domain, and this surely leads to loss of efficiency. spectral count accurate mass and time false discovery rate 1,1,1,3,3,3-hexafluoro-2-propanol (heptafluoroisopropanol) principal component analysis peptide peak attribute alternative peptide-protein roll-up procedure PCA-based peptide-protein roll-up procedure tris(2-carboxyethyl)phosphine hydrochloride gene ontology. Peaks in the LC-MS domain corresponding to peptide ion species are highly sensitive to differences in protein abundance (13.Wang W. Zhou H. Lin H. Roy S. Shaler T.A. Hill L.R. Norton S. Kumar P. Anderle M. Becker C.H. Quantification of proteins and metabolites by mass spectrometry without isotopic labeling or spiked standards.Anal. Chem. 2003; 75: 4818-4826Crossref PubMed Scopus (578) Google Scholar, 14.Radulovic D. Jelveh S. Ryu S. Hamilton T.G. Foss E. Mao Y. Emili A. Informatics platform for global proteomic profiling and biomarker discovery using liquid chromatography-tandem mass spectrometry.Mol. Cell. Proteomics. 2004; 3: 984-997Abstract Full Text Full Text PDF PubMed Scopus (198) Google Scholar). Identifying LC-MS peaks that correspond to detected peptides and measuring quantitative attributes of these peaks (such as height, area, or volume) offers a promising alternative to spectral counting methods. These methods have become especially popular in applications using stable isotope labeling (15.Cox J. Mann M. MaxQuant enables high peptide identification rates, individualized ppb-range mass accuracies and proteome-wide protein quantification.Nat. Biotechnol. 2008; 26: 1367-1372Crossref PubMed Scopus (8612) Google Scholar). However, challenges remain, especially in the label-free analysis of complex proteomics samples where complications in peak detection, alignment, and integration are a significant obstacle. In practice, alignment, identification, and quantification of LC-MS peptide peak attributes (PPAs) may be accomplished using recently developed peak matching platforms (16.Jaffe J.D. Mani D.R. Leptos K.C. Church G.M. Gillette M.A. Carr S.A. PEPPeR, a platform for experimental proteomic pattern recognition.Mol. Cell. Proteomics. 2006; 5: 1927-1941Abstract Full Text Full Text PDF PubMed Scopus (125) Google Scholar, 17.Bellew M. Coram M. Fitzgibbon M. Igra M. Randolph T. Wang P. May D. Eng J. Fang R. Lin C. Chen J. Goodlett D. Whiteaker J. Paulovich A. McIntosh M. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.Bioinformatics. 2006; 22: 1902-1909Crossref PubMed Scopus (225) Google Scholar, 18.May D. Fitzgibbon M. Liu Y. Holzman T. Eng J. Kemp C.J. Whiteaker J. Paulovich A. McIntosh M. A platform for accurate mass and time analyses of mass spectrometry data.J. Proteome Res. 2007; 6: 2685-2694Crossref PubMed Scopus (69) Google Scholar). A highly sensitive indicator of protein abundance may be obtained by rolling up PPA measurements into protein-level information (16.Jaffe J.D. Mani D.R. Leptos K.C. Church G.M. Gillette M.A. Carr S.A. PEPPeR, a platform for experimental proteomic pattern recognition.Mol. Cell. Proteomics. 2006; 5: 1927-1941Abstract Full Text Full Text PDF PubMed Scopus (125) Google Scholar, 19.Polpitiya A.D. Qian W.J. Jaitly N. Petyuk V.A. Adkins J.N. Camp 2nd, D.G. Anderson G.A. Smith R.D. DAnTE: a statistical tool for quantitative analysis of -omics data.Bioinformatics. 2008; 24: 1556-1558Crossref PubMed Scopus (319) Google Scholar, 20.Griffin N.M. Yu J. Long F. Oh P. Shore S. Li Y. Koziol J.A. Schnitzer J.E. Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis.Nat. Biotechnol. 2010; 28: 83-89Crossref PubMed Scopus (313) Google Scholar). Existing peptide-protein roll-up procedures based on PPAs typically involve taking the mean of (possibly normalized) PPA measurements over all peptides corresponding to a protein to obtain a protein-level estimate of abundance. Despite the promise of PPA-based procedures for protein quantification, the performance of PPA-based methods may vary widely depending on the particular roll-up procedure used; furthermore, PPA-based procedures are limited by difficulties in accurately identifying and measuring peptide peak attributes. These two issues are related as the latter issue affects the robustness of PPA-based roll-up methods. Indeed, existing peak matching and quantification platforms tend to result in PPA measurement data sets with substantial missingness (16.Jaffe J.D. Mani D.R. Leptos K.C. Church G.M. Gillette M.A. Carr S.A. PEPPeR, a platform for experimental proteomic pattern recognition.Mol. Cell. Proteomics. 2006; 5: 1927-1941Abstract Full Text Full Text PDF PubMed Scopus (125) Google Scholar, 19.Polpitiya A.D. Qian W.J. Jaitly N. Petyuk V.A. Adkins J.N. Camp 2nd, D.G. Anderson G.A. Smith R.D. DAnTE: a statistical tool for quantitative analysis of -omics data.Bioinformatics. 2008; 24: 1556-1558Crossref PubMed Scopus (319) Google Scholar, 21.Katajamaa M. Oresic M. Processing methods for differential analysis of LC/MS profile data.BMC Bioinformatics. 2005; 6: 179Crossref PubMed Scopus (315) Google Scholar), especially when working with very complex samples where substantial dynamic ranges and ion suppression are difficulties that must be overcome. Missingness may, in turn, lead to instability in protein-level abundance estimates. A good peptide-protein roll-up procedure that utilizes PPAs should account for this missingness and the resulting instability in a principled way. However, even in the absence of missingness, there is no consensus in the existing literature on peptide-protein roll-up for PPA measurements. In this work, we propose ProPCA, a peptide-protein roll-up method for efficiently extracting protein abundance information from bottom-up label-free LC-MS/MS data. ProPCA is an easily implemented, unsupervised method that is related to principle component analysis (PCA) (22.Rencher A. Methods of Multivariate Analysis. 2nd Ed. Wiley-Interscience, New York2002: 380-407Crossref Google Scholar). ProPCA optimally combines SC and PPA data to obtain estimates of relative protein abundance. ProPCA addresses missingness in PPA measurement data in a unified way while capitalizing on strengths of both SCs and PPA-based roll-up methods. In particular, ProPCA adapts to the quality of the available PPA measurement data. If the PPA measurement data are poor and, in the extreme case, no PPA measurements are available, then ProPCA is equivalent to spectral counting. On the other hand, if there is no missingness in the PPA measurement data set, then the ProPCA estimate is a weighted mean of PPA measurements and spectral counts where the weights are chosen to reflect the ability of spectral counts and each peptide to predict protein abundance. Below, we assess the performance of ProPCA using a data set obtained from the LC-MS/MS analysis of protein standards (UPS2 proteomic dynamic range standard set 2Introduced in 2006 by P. C. Andrews, D. P. Arnott, M. A. Gawinowicz, J. A. Kowalak, W. S. Lane, K. S. Lilley, L. T. Martin, and S. E. Stein, The Association of Biomolecular Resource Facilities Proteomics Standards Research Group, unpublished data. manufactured by Sigma-Aldrich) and show that ProPCA outperformed other existing roll-up methods by multiple metrics. The applicability of ProPCA is not limited by the quantification platform used to obtain SCs and PPA measurements. To demonstrate this, we show that ProPCA continued to perform well when used with an alternative quantification platform. Finally, we applied ProPCA to a comparative LC-MS/MS analysis of digested total human hepatocellular carcinoma (HepG2) cell lysates prepared for LC-MS/MS analysis by alternative lysis methods. We show that ProPCA identified more differentially abundant proteins than competing methods. A CTC Autosampler (LEAP Technologies) was equipped with two 10-port Valco valves and a 20-μl injection loop. A 2D LC system (Eksigent) was used to deliver a flow rate of 3 μl/min during sample loading and 250 nl/min during nanoflow LC separation. Self-packed columns included a C18 solid phase extraction “trapping” column (250-μm inner diameter × 10 mm) and a nano-LC capillary column (100-μm inner diameter × 15 cm, 8-μm-inner diameter pulled tip (New Objective)), both packed with Magic C18AQ, 3-μm, 200-Å (Michrom Bioresources) stationary phase. A protein digest (10 μl) approximately equivalent to 70 μg of the initial protein extract was injected onto the trapping column connected on line with the nano-LC column through the 10-port Valco valve. The sample was cleaned up and concentrated using the trapping column and eluted onto and separated on the nano-LC column with a 1-h linear gradient of acetonitrile in 0.1% formic acid. The LC-MS/MS solvents were 2% acetonitrile in aqueous 0.1% formic acid (Solvent A) and 5% isopropanol, 85% acetonitrile in aqueous 0.1% formic acid (Solvent B). The 85-min-long LC gradient program included the following elution conditions: 2% B for 1 min, 2–35% B in 60 min, 35–90% B in 10 min, 90% B for 2 min, and 90–2% B in 2 min. The eluent was introduced into an LTQ Orbitrap (Thermo Electron) mass spectrometer equipped with a nanoelectrospray source (New Objective) by nanoelectrospray. The source voltage was set to 2.2 kV, and the temperature of the heated capillary was set to 180 °C. For each scan cycle, one full MS scan was acquired in the Orbitrap mass analyzer at 60,000 mass resolution, 6 × 105 automatic gain control target, and 1200-ms maximum ion accumulation time was followed by seven MS/MS scans acquired for the seven most intense ions for each of the following m/z ranges: 350–700, 695–1200, and 1195–1700 atomic mass units (amu). The LTQ mass analyzer was set for 30,000 automatic gain control target, 100-ms maximum accumulation time, 2.2-Da isolation width, and 30-ms activation at 35% normalized collision energy. Dynamic exclusion was enabled for 45 s for each of the 200 ions that already had been selected for fragmentation to exclude them from repeated fragmentation. The UPS2 samples were analyzed as described above using a shorter 15-min-long LC-MS gradient. Each of the UPS2 samples was analyzed by LC-MS/MS three to seven times. Each HepG2 digest was analyzed three times. For both the UPS2 standards and the HepG2 cell lysate analyses, the MS data .raw files acquired by the LTQ Orbitrap mass spectrometer and Xcalibur (version 2.0.6; Thermo Electron) were copied to the Sorcerer IDA2 search engine (version 3.5 RC2; Sage-N Research, Thermo Electron) and submitted for database searches using the SEQUEST-Sorcerer algorithm (version 4.0.4). For the UPS2 data, the search was performed against a concatenated FASTA database comprising 354 sequences in total. This database contained the 48 UPS2 protein constituents and 129 proteins from an in-house database of common contaminants; reverse sequences for all proteins were included in the database. For the HepG2 data, the search was performed against a concatenated FASTA database containing 114,356 sequences in total and comprising 57,049 proteins from the human (25H.Sapiens) UniProtKB database downloaded from the European Molecular Biology Laboratory-European Bioinformatics Institute on October 23, 2008, the 129 common contaminants from our in-house database, and reverse sequences. Methionine, histidine, and tryptophane oxidation (+15.994915 amu) and cysteine alkylation (+57.021464 amu with iodoacetamide derivative) were set as differential modifications. No static modifications or differential posttranslational modifications were used. A peptide mass tolerance equal to 30 ppm and a fragment ion mass tolerance equal to 0.8 amu were used in all searches. Monoisotopic mass type, fully tryptic peptide termini, and up to two missed cleavages were used in all searches. Spectral count information was extracted from PeptideProphet files (stored in .pepXML format). We calculated the SC of a protein in a given sample by counting the number of MS/MS spectra in the sample matched to peptides that correspond to the protein under consideration. It may happen that a peptide corresponds to more than one protein. (In the UPS2 standard set, where a smaller database was used, 6.7% of identified peptides were matched to multiple proteins; in the HepG2 data set, 47% of identified peptide were matched to multiple proteins.) This may lead to ambiguity in assigning SCs. In our analysis, when a peptide was matched to multiple proteins, we randomly assigned the peptide to a single protein from the list of corresponding proteins. This may introduce additional noise into the data; however, because our focus was the comparison of peptide-protein roll-up procedures, this should not bias our results. A more involved treatment of peptides matched to multiple proteins is possible, but this was not the focus of this project. The supplemental data contain protein identification information, including sequence coverage information, obtained from ProteinProphet for the UPS2 and HepG2 data; sequence coverage information for the UPS2 data is also displayed in supplemental Table S1. To preserve a low false positive rate, only MS/MS spectra matched to peptides with PeptideProphet probability greater than 0.95 were utilized when calculating spectral counts. Additionally, in our final analysis, we only considered proteins that were identified by at least two distinct peptides. The false positive rate was calculated as the number of peptide matches from a “reverse” database divided by the total number of “forward” protein matches, and then this value was converted to a percentage (similar to Peng et al. (23.Peng J. Elias J.E. Thoreen C.C. Licklider L.J. Gygi S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome.J. Proteome Res. 2003; 2: 43-50Crossref PubMed Scopus (1359) Google Scholar) and Qian et al. (24.Qian W.J. Liu T. Monroe M.E. Strittmatter E.F. Jacobs J.M. Kangas L.J. Petritis K. Camp 2nd, D.G. Smith R.D. Probability-based evaluation of peptide and protein identifications from tandem mass spectrometry and SEQUEST analysis: the human proteome.J. Proteome Res. 2005; 4: 53-62Crossref PubMed Scopus (293) Google Scholar)). After these filtering steps, the false positive rate was <0.05% for both the UPS2 and HepG2 data. We used two software platforms, msInspect/AMT (build 221) (17.Bellew M. Coram M. Fitzgibbon M. Igra M. Randolph T. Wang P. May D. Eng J. Fang R. Lin C. Chen J. Goodlett D. Whiteaker J. Paulovich A. McIntosh M. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.Bioinformatics. 2006; 22: 1902-1909Crossref PubMed Scopus (225) Google Scholar, 18.May D. Fitzgibbon M. Liu Y. Holzman T. Eng J. Kemp C.J. Whiteaker J. Paulovich A. McIntosh M. A platform for accurate mass and time analyses of mass spectrometry data.J. Proteome Res. 2007; 6: 2685-2694Crossref PubMed Scopus (69) Google Scholar, 25.May D. Liu Y. Law W. Fitzgibbon M. Wang H. Hanash S. McIntosh M. Peptide sequence confidence in accurate mass and time analysis and its use in complex proteomics experiments.J. Proteome Res. 2008; 7: 5148-5156Crossref PubMed Scopus (14) Google Scholar) and Progenesis LC-MS software (version 2.5; Nonlinear Dynamics), to obtain PPA measurements from the .raw files. Both software platforms utilize peak alignment algorithms and are capable of ascertaining PPA measurements for a given peptide in runs where the peptide was not identified at the MS/MS level by leveraging information from other runs. The msInspect/AMT peak alignment algorithm has been described (17.Bellew M. Coram M. Fitzgibbon M. Igra M. Randolph T. Wang P. May D. Eng J. Fang R. Lin C. Chen J. Goodlett D. Whiteaker J. Paulovich A. McIntosh M. A suite of algorithms for the comprehensive analysis of complex protein mixtures using high-resolution LC-MS.Bioinformatics. 2006; 22: 1902-1909Crossref PubMed Scopus (225) Google Scholar, 18.May D. Fitzgibbon M. Liu Y. Holzman T. Eng J. Kemp C.J. Whiteaker J. Paulovich A. McIntosh M. A platform for accurate mass and time analyses of mass spectrometry data.J. Proteome Res. 2007; 6: 2685-2694Crossref PubMed Scopus (69) Google Scholar, 25.May D. Liu Y. Law W. Fitzgibbon M. Wang H. Hanash S. McIntosh M. Peptide sequence confidence in accurate mass and time analysis and its use in complex proteomics experiments.J. Proteome Res. 2008; 7: 5148-5156Crossref PubMed Scopus (14) Google Scholar); the Progenesis LC-MS software utilizes a proprietary alignment algorithm. To obtain PPA measurements using msInspect/AMT, we first converted the .raw LC-MS/MS data files into .mzXML files (26.Pedrioli P.G. Eng J.K. Hubley R. Vogelzang M. Deutsch E.W. Raught B. Pratt B. Nilsson E. Angeletti R.H. Apweiler R. Cheung K. Costello C.E. Hermjakob H. Huang S. Julian R.K. Kapp E. McComb M.E. Oliver S.G. Omenn G. Paton N.W. Simpson R. Smith R. Taylor C.F. Zhu W. Aebersold R. A common open representation of mass spectrometry data and its application to proteomics research.Nat. Biotechnol. 2004; 22: 1459-1466Crossref PubMed Scopus (638) Google Scholar) using the ReAdW software (latest version available at http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW). Using msInspect/AMT, we created an AMT database. In the first step, we found and filtered features (peptides) in the LS-MS domain. For the UPS2 data, we set “maxkl” to 3 and “minpeaks” to 2 when filtering features with default values for all other settings; the same settings were used for the HepG2 data, except we also set “minIntensity” to 28,000. Building the AMT database requires LC-MS peak information, obtained from filtered features, and the .pepXML files created after SEQUEST database searching. To create the AMT database for the UPS2 data, we set “mintime” to 900, “maxtime” to 5640, “deltatime” to 200, “deltamassppm” to 20, and “minpprophet” to 0.95; default values were used for all other settings. We used the same settings for the HepG2 cell lysate data, except we set mintime to 1680 and maxtime to 6480. Finally, to obtain PPA measurements, features in the LC-MS domain were matched to peptides identified via MS/MS spectra with the aid of the AMT database. For both the UPS2 and HepG2 data, the non-default settings used for the matching procedure were “deltatimems1ms2” of 200 and minpprophet of 0.95. To ensure that only high quality matches were used, matches with corresponding AMT match probabilities (25.May D. Liu Y. Law W. Fitzgibbon M. Wang H. Hanash S. McIntosh M. Peptide sequence confidence in accurate mass and time analysis and its use in complex proteomics experiments.J. Proteome Res. 2008; 7: 5148-5156Crossref PubMed Scopus (14) Google Scholar) less than 0.95 were ultimately discarded. The resulting AMT match data file contained the PPA information necessary for ProPCA and the other roll-up procedures we considered. The supplemental data include information from .pepXML files and msInspect/AMT match files, which contain PPA measurements, for all UPS2 and HepG2 samples. A similar procedure was followed to obtain PPA information using the Progenesis LC-MS software. We first uploaded our .raw files and grouped and aligned the LC-MS profiles using an option for setting alignment vectors automatically. After manual validation of the alignment results, additional vectors were manually inserted where needed, and the results of PeptideProphet analysis were loaded using the corresponding .pepXML files. The Progenesis LC-MS software allows filtering of MS/MS matches using XCorr versus peptide charge state SEQUEST scores. For charge states 1+, 2+, and ≥3+, we filtered out MS/MS matches with XCorr below 2, 2.5, and 3, respectively. The resulting false positive rate for peptide identification was <0.05%, and the resulting matches formed the basis for our analysis of the Progenesis data. The supplemental data contain the relevant Progenesis output, including PPA measurements for the UPS2 samples (the HepG2 samples were not analyzed with the Progenesis LC-MS software). Let log(SC) denote the natural logarithm of SCs (before taking logarithms, we add 1 to each SC to avoid taking the logarithm of 0), and let log(PPA) denote the natural logarithm of PPA measurements. To motivate and derive the ProPCA estimator of relative protein abundance, consider the following model. Let yijk represent log(PPA) for the kth peptide (or log(SC) if k = 1), corresponding to the jth protein in the ith sample. We suppose that there are N samples in total, that a total of M proteins were identified, and that Pj peptides correspond to the jth protein. Thus, for our observations yijk, the indices i, j, and k run through i = 1, …, N; j = 1, …, M; and k = 1, …, Pj. We let βij denote the abundance of the jth protein in the ith sample. Given an approximately linear relationship between log(SC), log(PPA), and log protein abundance (discussed further under “Results”), a reasonable statistical model relating the observed log(PPA) or log(SC) values, yijk, and log protein abundance, βij, is given by Eyijk=γOjk+γ1jkβij,(Eq. 1) where Eyijk is the expected value of yijk, averaging over random noise, and γ0jk and γ1jk are peptide- (or, when k = 1, SC)-specific effects. Note that βij in the model (Equation 1) is only iden" @default.
- W2104394966 created "2016-06-24" @default.
- W2104394966 creator A5011372682 @default.
- W2104394966 creator A5030660111 @default.
- W2104394966 creator A5065359511 @default.
- W2104394966 date "2010-12-01" @default.
- W2104394966 modified "2023-09-26" @default.
- W2104394966 title "Increased Power for the Analysis of Label-free LC-MS/MS Proteomics Data by Combining Spectral Counts and Peptide Peak Attributes" @default.
- W2104394966 cites W1607252236 @default.
- W2104394966 cites W1977291445 @default.
- W2104394966 cites W1989828258 @default.
- W2104394966 cites W1997504014 @default.
- W2104394966 cites W2005794750 @default.
- W2104394966 cites W2023096047 @default.
- W2104394966 cites W2026465178 @default.
- W2104394966 cites W2037837681 @default.
- W2104394966 cites W2039380477 @default.
- W2104394966 cites W2040482262 @default.
- W2104394966 cites W2053894359 @default.
- W2104394966 cites W2055063265 @default.
- W2104394966 cites W2063543381 @default.
- W2104394966 cites W2065436349 @default.
- W2104394966 cites W2072099326 @default.
- W2104394966 cites W2080752012 @default.
- W2104394966 cites W2089492568 @default.
- W2104394966 cites W2096863518 @default.
- W2104394966 cites W2107231094 @default.
- W2104394966 cites W2112078820 @default.
- W2104394966 cites W2116807043 @default.
- W2104394966 cites W2125419939 @default.
- W2104394966 cites W2146103026 @default.
- W2104394966 cites W2147830145 @default.
- W2104394966 cites W2148342439 @default.
- W2104394966 cites W2148490163 @default.
- W2104394966 cites W2148598563 @default.
- W2104394966 cites W2149414429 @default.
- W2104394966 cites W2149517572 @default.
- W2104394966 cites W2152180036 @default.
- W2104394966 cites W2153132589 @default.
- W2104394966 cites W2160110467 @default.
- W2104394966 cites W2161345806 @default.
- W2104394966 cites W2167443016 @default.
- W2104394966 cites W2168745915 @default.
- W2104394966 doi "https://doi.org/10.1074/mcp.m110.002774" @default.
- W2104394966 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/3101957" @default.
- W2104394966 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/20823122" @default.
- W2104394966 hasPublicationYear "2010" @default.
- W2104394966 type Work @default.
- W2104394966 sameAs 2104394966 @default.
- W2104394966 citedByCount "43" @default.
- W2104394966 countsByYear W21043949662012 @default.
- W2104394966 countsByYear W21043949662013 @default.
- W2104394966 countsByYear W21043949662014 @default.
- W2104394966 countsByYear W21043949662015 @default.
- W2104394966 countsByYear W21043949662016 @default.
- W2104394966 countsByYear W21043949662017 @default.
- W2104394966 countsByYear W21043949662018 @default.
- W2104394966 countsByYear W21043949662019 @default.
- W2104394966 countsByYear W21043949662020 @default.
- W2104394966 countsByYear W21043949662021 @default.
- W2104394966 countsByYear W21043949662022 @default.
- W2104394966 countsByYear W21043949662023 @default.
- W2104394966 crossrefType "journal-article" @default.
- W2104394966 hasAuthorship W2104394966A5011372682 @default.
- W2104394966 hasAuthorship W2104394966A5030660111 @default.
- W2104394966 hasAuthorship W2104394966A5065359511 @default.
- W2104394966 hasBestOaLocation W21043949661 @default.
- W2104394966 hasConcept C104317684 @default.
- W2104394966 hasConcept C162356407 @default.
- W2104394966 hasConcept C185592680 @default.
- W2104394966 hasConcept C2779281246 @default.
- W2104394966 hasConcept C43617362 @default.
- W2104394966 hasConcept C46111723 @default.
- W2104394966 hasConcept C55493867 @default.
- W2104394966 hasConcept C64489805 @default.
- W2104394966 hasConcept C73090800 @default.
- W2104394966 hasConcept C80311884 @default.
- W2104394966 hasConceptScore W2104394966C104317684 @default.
- W2104394966 hasConceptScore W2104394966C162356407 @default.
- W2104394966 hasConceptScore W2104394966C185592680 @default.
- W2104394966 hasConceptScore W2104394966C2779281246 @default.
- W2104394966 hasConceptScore W2104394966C43617362 @default.
- W2104394966 hasConceptScore W2104394966C46111723 @default.
- W2104394966 hasConceptScore W2104394966C55493867 @default.
- W2104394966 hasConceptScore W2104394966C64489805 @default.
- W2104394966 hasConceptScore W2104394966C73090800 @default.
- W2104394966 hasConceptScore W2104394966C80311884 @default.
- W2104394966 hasIssue "12" @default.
- W2104394966 hasLocation W21043949661 @default.
- W2104394966 hasLocation W21043949662 @default.
- W2104394966 hasLocation W21043949663 @default.
- W2104394966 hasLocation W21043949664 @default.
- W2104394966 hasLocation W21043949665 @default.
- W2104394966 hasOpenAccess W2104394966 @default.
- W2104394966 hasPrimaryLocation W21043949661 @default.
- W2104394966 hasRelatedWork W111275185 @default.
- W2104394966 hasRelatedWork W1975323042 @default.
- W2104394966 hasRelatedWork W1975558168 @default.