Matches in SemOpenAlex for { <https://semopenalex.org/work/W2130271547> ?p ?o ?g. }
Showing items 1 to 90 of
90
with 100 items per page.
- W2130271547 endingPage "NA" @default.
- W2130271547 startingPage "NA" @default.
- W2130271547 abstract "There is no dispute that the overwhelming majority of the 50,000 structures in the Protein Data Bank are essentially correct (i.e., lacking major experimental error). Nevertheless, there have been and there continue to be reports of protein structures which are seriously in error. A number of these are listed in Table I. It might be argued that as crystallographers have improved their techniques, and learned by experience, they should become “better” at solving structures. Table I suggests that this is not necessarily the case. Indeed, it is remarkable that all of the earliest protein structures to be determined at atomic resolution (myoglobin, lysozyme, chymotrypsin, carboxypeptidase, ribonuclease) have proven to be essentially correct and have not needed substantive revision. When these structures were first reported, now all over 40 years ago, the protagonists never knew quite what to expect. Also, in most cases the investigators had little if any experience in interpreting protein electron density maps. All of these early structures were determined using fairly high resolution data (∼2.5–2.0 Å) and multiple heavy-atom derivatives (because this is what had worked for myoglobin). Looking back, some of these structures were probably “overdetermined,” in the sense that the basic fold of the protein might have been found using a subset of the original data. The risk of using too few observations became apparent later (Table I). Not surprisingly, and for a good reason, structures found to be in error have prompted the development of checks (i.e., “model validation”), which can be used to try to ensure that similar mistakes do not occur in the future.29 A number of these tests are based on expected stereochemistry. Are bond lengths and angles within normal limits? Are the Ramachandran angles acceptable? Are there nonbonded atoms that are placed too close to each other? At the same time, such tests need to be applied with discretion.30 For example, Hooft et al.31 published a list of “errors” in protein structures in the PDB based on violations of various geometric criteria. They considered all D-amino acids in the Protein Data Bank to be “errors,” but some of these, for example, were from protein complexes with cell-wall fragments that contain bonafide D-amino acids. Others were from the antibiotic gramicidin, which also contains D-amino acids. Other tests of model structures are more crystallographic in nature. Are the crystallographic refinement residuals R and Rfree sufficiently low? Do the B-factors (indicative of mobility) correlate with solvent exposure? Does the placement of the atoms in the model match the local electron density? Yet, other tests are based on energetic considerations. Do polar atoms have polar neighbors and, likewise, are nonpolar atoms in nonpolar environments? Notwithstanding some limitations, tests such as these provide an overall way to estimate the quality of structural models in the PDB, and this is the subject of a recent report by Brown and Ramaswamy.32 To use their words, “the most striking result is the association between structure quality and the journal in which the structure was first published. The worst offenders are the apparently high-impact general science journals that include Cell, Science, Molecular Cell and Nature. The rush to publish high-impact work in the competitive atmosphere may have led to the proliferation of poor-quality structures.” In the current issue of Protein Science, Sheffler and Baker33 introduce a new method to test for the integrity of structure models based on core packing. Their approach does highlight specific entries in the PDB, which have been questioned as being unreliable. It also points to a type of error that might occur for other structures and would be quite difficult to detect, namely, a mistake in the dimensions of the crystallographic unit cell. In a typical X-ray structure analysis, the dimensions of the unit cell are routinely obtained to high precision during X-ray data collection and are subsequently deposited in the PDB along with the crystallographic coordinates. As noted by Sheffler and Baker, PDB entry 179L corresponds to a mutant of T4 lysozyme, which was inadvertently refined with the a and b cell dimensions 10% too large (entry 177L corresponds to the same mutant refined with the correct unit cell). In the mathematical description of diffraction, the coordinates are in fractional (dimensionless) units, not in Angstroms. If a structure is refined with the wrong unit cell dimensions, the fractional coordinates will remain essentially the same and the calculated structure factor amplitudes will also change little (see the next paragraph). The consequences of the error in the cell dimensions become more apparent when the fractional coordinates are converted into absolute values (i.e., Angstroms). In the case of the lysozyme mutant, a and b were 10% too large. This meant that the refined model was “stretched” by 10% in the directions of the a and b cell edges. This in turn caused unusual cavities in the core, which were recognized by the Sheffler and Baker procedure. In terms of the crystallographic refinement, this “stretching” of the model distorts individual bond lengths and angles but this was not apparent from the standard checking procedures. The insensitivity of the refinement process to substantial errors in the cell constants is at first surprising. If one takes a particular model from the PDB and performs a series of parallel refinements where the cell constants are systematically varied, one finds that the resulting R-values increase rapidly as one moves away from the “true” cell constants. This result, however, does not imply that an error in the cell constants will necessarily be signaled by an increase in the R-factor. If the coordinates of the protein molecule are in Angstroms, as are the entries in the PDB, then a change Δa in one of the cell dimensions will, in effect, cause a rigid-body translation of the protein parallel to that cell edge, with magnitude between zero and ±Δa depending on the location of the molecule in the unit cell. If Δa is sufficiently large (e.g., several Angstroms), the shift in the protein coordinates may be outside the range of convergence of the refinement procedure and the R-value will remain high. If, however, the starting crystallographic coordinates are converted to fractional values before making the change in the cell constant then the R-factor will change very little and the refinement procedure will be very insensitive to the error in the cell edge. In the case of the lysozyme mutant, the error in the cell dimensions was substantial and was recognized by the cavity calculation. A smaller error would be much harder to identify. As an example, suppose one had a crystal in which all three cell dimensions were too large by 1%. (Such an error might occur, if the crystal-to-detector distance or the X-ray wavelength was specified incorrectly.) The subsequent refinement of the structure would seem normal and the error would be almost impossible to detect by the standard tests. The refined model, however, would be expanded by 1%. Such an error might be detected if the structure was compared with a close homolog but it would presumably require more than a routine calculation of the root-mean-square difference between the two structures. In general, an error in one or other of the cell dimensions of a crystalline protein would be expected to be most apparent in global features of the whole protein such as density, radius of gyration, or overall dimensions. In the early days of structural biology, the following question was sometimes discussed: “If one were working on the crystal structure of a heretofore unknown protein, would it be possible to generate a ‘fake’ model of the structure that would be sufficiently plausible to be accepted as the correct structure?” The general consensus at the time was that to do so would require more work than to simply determine the correct structure experimentally. Nevertheless, experience has shown that the identification of incorrect structures is not a trivial matter, has typically taken 5–10 years (Table I), and in some instances is yet to be resolved.34, 35 It might also be noted (Table I) that the resolution of the diffraction data for the structure reports that had to be withdrawn is quite variable (4.5–2.3 Å). Furthermore, several of the structural revisions were based on data to lower resolution than the initial report. Higher resolution helps but is not, of itself, a guarantee for reliability. The experience gained from studying the proteins listed in Table I has served to highlight some of the “danger signals” that may indicate questionable crystallographic models. Early experience with ferredoxin, for example, emphasized the risk of including an excessively large number of water molecules in the crystallographic model.2 The error with the lysozyme mutant mentioned above occurred because the cell constants and structure factor amplitudes were recorded separately.33 It could have been avoided by the trivial change of keeping both of these quantities in the same file. For both teaching and research purposes, it would be very helpful if the PDB could maintain a directory including the coordinates and structure amplitudes for all (or as many as possible) of the proteins listed in Table I, both the initial reports and any subsequent revision. Easy access to these data would facilitate the development of new and better methods for structure validation." @default.
- W2130271547 created "2016-06-24" @default.
- W2130271547 creator A5014217022 @default.
- W2130271547 creator A5053627320 @default.
- W2130271547 date "2008-01-01" @default.
- W2130271547 modified "2023-09-26" @default.
- W2130271547 title "Sorting the chaff from the wheat at the PDB" @default.
- W2130271547 cites W1555762048 @default.
- W2130271547 cites W1575074061 @default.
- W2130271547 cites W1967690058 @default.
- W2130271547 cites W1973238993 @default.
- W2130271547 cites W1985858831 @default.
- W2130271547 cites W1992681071 @default.
- W2130271547 cites W1992829848 @default.
- W2130271547 cites W1993925574 @default.
- W2130271547 cites W1994392652 @default.
- W2130271547 cites W1998610102 @default.
- W2130271547 cites W2001672262 @default.
- W2130271547 cites W2007376635 @default.
- W2130271547 cites W2014477053 @default.
- W2130271547 cites W2030183900 @default.
- W2130271547 cites W2050879305 @default.
- W2130271547 cites W2062482701 @default.
- W2130271547 cites W2064377652 @default.
- W2130271547 cites W2081823517 @default.
- W2130271547 cites W2082562760 @default.
- W2130271547 cites W2088933990 @default.
- W2130271547 cites W2091988108 @default.
- W2130271547 cites W2092171870 @default.
- W2130271547 cites W2094303590 @default.
- W2130271547 cites W2103662266 @default.
- W2130271547 cites W2124276056 @default.
- W2130271547 cites W2134482660 @default.
- W2130271547 cites W2137827496 @default.
- W2130271547 cites W2161983628 @default.
- W2130271547 cites W2167073106 @default.
- W2130271547 cites W4240422054 @default.
- W2130271547 doi "https://doi.org/10.1002/pro.13" @default.
- W2130271547 hasPubMedCentralId "https://www.ncbi.nlm.nih.gov/pmc/articles/2708036" @default.
- W2130271547 hasPubMedId "https://pubmed.ncbi.nlm.nih.gov/19177345" @default.
- W2130271547 hasPublicationYear "2008" @default.
- W2130271547 type Work @default.
- W2130271547 sameAs 2130271547 @default.
- W2130271547 citedByCount "1" @default.
- W2130271547 crossrefType "journal-article" @default.
- W2130271547 hasAuthorship W2130271547A5014217022 @default.
- W2130271547 hasAuthorship W2130271547A5053627320 @default.
- W2130271547 hasBestOaLocation W21302715471 @default.
- W2130271547 hasConcept C111696304 @default.
- W2130271547 hasConcept C11413529 @default.
- W2130271547 hasConcept C3017879956 @default.
- W2130271547 hasConcept C31903555 @default.
- W2130271547 hasConcept C41008148 @default.
- W2130271547 hasConcept C55493867 @default.
- W2130271547 hasConcept C59822182 @default.
- W2130271547 hasConcept C65556437 @default.
- W2130271547 hasConcept C78573896 @default.
- W2130271547 hasConcept C86803240 @default.
- W2130271547 hasConceptScore W2130271547C111696304 @default.
- W2130271547 hasConceptScore W2130271547C11413529 @default.
- W2130271547 hasConceptScore W2130271547C3017879956 @default.
- W2130271547 hasConceptScore W2130271547C31903555 @default.
- W2130271547 hasConceptScore W2130271547C41008148 @default.
- W2130271547 hasConceptScore W2130271547C55493867 @default.
- W2130271547 hasConceptScore W2130271547C59822182 @default.
- W2130271547 hasConceptScore W2130271547C65556437 @default.
- W2130271547 hasConceptScore W2130271547C78573896 @default.
- W2130271547 hasConceptScore W2130271547C86803240 @default.
- W2130271547 hasLocation W21302715471 @default.
- W2130271547 hasLocation W21302715472 @default.
- W2130271547 hasLocation W21302715473 @default.
- W2130271547 hasLocation W21302715474 @default.
- W2130271547 hasOpenAccess W2130271547 @default.
- W2130271547 hasPrimaryLocation W21302715471 @default.
- W2130271547 hasRelatedWork W2004044961 @default.
- W2130271547 hasRelatedWork W2095345650 @default.
- W2130271547 hasRelatedWork W2130271547 @default.
- W2130271547 hasRelatedWork W2352429183 @default.
- W2130271547 hasRelatedWork W2374501422 @default.
- W2130271547 hasRelatedWork W2599056467 @default.
- W2130271547 hasRelatedWork W2758904623 @default.
- W2130271547 hasRelatedWork W2763686200 @default.
- W2130271547 hasRelatedWork W2998900262 @default.
- W2130271547 hasRelatedWork W4295110832 @default.
- W2130271547 isParatext "false" @default.
- W2130271547 isRetracted "false" @default.
- W2130271547 magId "2130271547" @default.
- W2130271547 workType "article" @default.