Missing residues and incomplete sidechains
From Proteopedia
| Line 116: | Line 116: | ||
</td><td> | </td><td> | ||
AlphaFold-predicted structure for [https://www.uniprot.org/uniprotkb/F8JB59/entry F8JB59]. Loops missing in [[5nyp]] are '''at top'''. | AlphaFold-predicted structure for [https://www.uniprot.org/uniprotkb/F8JB59/entry F8JB59]. Loops missing in [[5nyp]] are '''at top'''. | ||
| + | Confidence (>70 is "confident"): [[Image:PLDDT-color-key-fgij.png|300px]] | ||
</td></tr></table> | </td></tr></table> | ||
Revision as of 22:01, 4 November 2024
|
This page is under construction. This notice will be removed when it is complete. Eric Martz 21:26, 29 October 2024 (UTC) |
Contents |
Missing Residues
In about 90%[1] of the Empirical models in the PDB, some residues (amino acids or nucleotides) that were present in the experimental material are absent (have no coordinates) in the empirical model. X-ray crystallography gives a clear electron density map only where every molecule in the protein crystal has the same conformation. Usually, some parts of the molecule vary in conformation between copies in the crystal, that is, some regions are disordered. The same may occur with protein molecules on a cryo-electron microscopy grid. These disordered portions of the molecule are not clearly resolved in the density map used to construct the structure model. Without density to guide where to place these residues, the experimenter omits them from the model. These are called missing residues. It is very common for a few residues at the ends of protein chains to be missing in the atomic model. (Example: 5 residues are missing from the carboxy terminus of the protein in 1ijw.)
To emphasize, the missing residues were present in the experimental material, but are absent in the resulting atomic model.
Missing Ends of Chains
Unlike other viewers, FirstGlance in Jmol ensures that you are aware of missing ends by marking them with spherical "empty baskets" (see #below). In the example shown below, 2ace, the 3 missing N-terminal residues DDH have net negative charge.
|
2ace amino-terminus missing 3 amino acids |
P04058 AlphaFold: | ||||
|---|---|---|---|---|---|
|
FirstGlance "empty basket" (see #below) | |||||
Missing Loops
FirstGlance ensures that you are aware of missing loops with an ellipsoidal "empty basket". Other viewers use a dotted line, which is easier to overlook when viewing the entire structure. Empty baskets are easily hidden[2].
|
2ace missing 5 amino acid loop |
P04058 AlphaFold: | ||||
|---|---|---|---|---|---|
|
FirstGlance "empty basket" (see #below)[3] | |||||
See For Yourself
- 2ace in FirstGlance in Jmol
- 2ace in Mol*
- 2ace in iCn3D
- (PyMOL and ChimeraX require that you download stand-alone applications.)
AlphaFold models have no missing atoms
When all empirical models of the protein of interest have missing atoms, the best way to get a model without missing atoms is to download the AlphaFold model. The downloaded PDB file can then be uploaded to FirstGlance. iCn3D will accept the UniProt sequence ID to retrieve the AlphaFold model directly. AlphaFold models lack ligands; only empirical models will include ligands, such as inhibitors of catalysis.
The AlphaFold model will usually be nearly identical to the empirical model (you should verify this by superposition), but will include the missing residues/atoms. Structures of missing loops will be predicted by AlphaFold, but typically with lower reliability (higher pLDDT) than the remainder of the structure. Many long loops that are missing in empirical models are actually intrinsically disordered, in which case the AlphaFold prediction will have very low reliability and be meaningless.
Why Missing Residues Matter
Where residues are completely missing in empirical models, the shape of the molecule will be incorrect, and when charged residues are missing, the distribution of charges will be incorrect. Salt bridges may be missing.
Example: 5nyp
5nyp is a bacterial protein believed to be an ancestor of the 20S proteasome[4]. All sequence numbers given below are UniProt numbering. Subtract one to match numbering in 5nyp.
- 5nyp has 3 missing loops, clustered on one face. Their lengths are 8, 10, and 12 residues (20-27, 102-111, 183-194).
- None of the 3 missing loops are predicted to be intrinsically disordered by RCSB[5]. The first missing loop is predicted to be disordered by flDPnn2a[6]. The other two are not. But flDPnn2a predicts disorder for 71-83, which is not missing.
- The missing loops include 7 charged amino acids (4–, 3+).
- Although the authors did not detect proteolytic activity with the substrates tested, the putative catalytic triad residues Thr2, Asp18, and Lys33 (UniProt numbers) are present.
- When present in the AlphaFold model, two of the missing loops partially obscure the putative catalytic triad.
- Two salt bridges are missing in 5nyp that are present in the AlphaFold model. UniProt Glu107 (present) forms a salt bridge with missing Arg99. UniProt Asp183 forms a salt bridge with arg187 -- both are missing in 5nyp.
|
Three Missing Loops in 5nyp | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
5nyp: missing loops (in front) as ellipsoidal "empty baskets"[7].
|
5nyp: yellow halos on putative catalytic triad, showing proximity to missing loops (in front). |
AlphaFold-predicted structure for F8JB59. Loops missing in 5nyp are at top.
Confidence (>70 is "confident"): | |||||||||
AlphaFold Prediction Superposed on 5nyp
|
The three missing loops are present in the AlphaFold prediction for UniProt F8JB59. FATCAT superposed the AlphaFold prediction onto all 213 alpha carbons of 5nyp with RMSD 2.4 Å. The morph between the two models shows their similarity. |
AlphaFold Prediction Superposed on 5nyp |
Morph Between Models |
|---|---|---|
Missing loops at top. |
Missing Loops Obstruct Catalytic Site
|
Missing loops affect the shape of the molecule. The putative catalytic site is exposed on 5nyp, but when the 3 loops are added in AlphaFold's prediction for UniProt F8JB59, they partially obstruct the putative catalytic site. |
AlphaFold-Predicted Structure | |
|---|---|---|
Charged atoms colored by FirstGlance: Positive +, Negative –. Putative catalytic triad: yellow halos. Loops predicted by AlphaFold in dark gray. | ||
Missing Charges in 5nyp
|
The three missing loops include 4 negatively-charged amino acids, and 3 positively charged ones. Their absence affects the distribution of charge on the surface, as shown in these electrostatic potential maps generated by iCn3D (from a link in the Views tab in FirstGlance). |
5nyp: 3 loops with 7 charges are missing. |
AlphaFold-Predicted Structure for F8JB59: No missing charges. |
|---|---|---|
Electrostatic potential maps by iCn3D (from a link in the Views tab of FirstGlance):
| ||
Missing Salt Bridges in 5nyp
|
Two salt bridges are missing in 5nyp as a result of two of the missing loops. At left: Missing UniProt Glu107 forms a salt bridge with Arg99 (present). At right: UniProt Asp183 forms a salt bridge with Arg187 -- both are missing in 5nyp. A salt bridge also forms between two of the putative catalytic triad residues, Lys31 and Asp169. (All triad residues are present in 5nyp.) |
AlphaFold-Predicted Structure for F8JB59. |
|---|---|
Salt bridges rendered by FirstGlance: Positive +, Negative –. Yellow halos: Charges missing in 5nyp. Green halos: Putative catalytic triad, present in 5nyp. |
Incomplete Sidechains
In about 70% of the Empirical models in the PDB, some residues have coordinates missing for some of their sidechain atoms, due to local disorder. But their main chain atoms are present in the model, leading to residues with incomplete sidechains. For example, the long sidechain of a lysine on the surface of a protein may have too blurry an electron density to indicate its position. In some cases, the model builder may give that sidechain coordinates with high temperatures, or low or zero occupancy (example: Arg321 in 2ade[8]). In other cases, the model builder simply omits the coordinates for the sidechain, so the aforementioned surface lysine may have the sidechain of an alanine (example: Lys498 in 2ace).
Among the five viewers shown below, only FirstGlance alerts you to incomplete sidechains in its initial view. It marks them S—. FirstGlance and iCn3D are the only ones that show disulfide bonds in their initial views. The S— labels are easily hidden[9].
|
Initial views of incomplete sidechains in 2ace | ||||
|---|---|---|---|---|
|
FirstGlance marks incomplete sidechains with S—, and shows disulfide bonds. |
iCn3D shows disulfide bonds. | |||
How to avoid overlooking missing residues or incomplete sidechains
When a PDB ID model is displayed in FirstGlance in Jmol, regions with missing residues are clearly marked with "empty baskets", as shown above. FirstGlance reports the total number missing, and the resulting number of missing charges, and offers a detailed report.
Notes & References
- ↑ Percentages are based on searches for REMARK 465 and REMARK 470 at OCA.
- ↑ See Hiding empty baskets.
- ↑ The "S-" indicates that Lys491 has an incomplete sidechain, extending only to the gamma carbon.
- ↑ Vielberg MT, Bauer VC, Groll M. On the Trails of the Proteasome Fold: Structural and Functional Analysis of the Ancestral beta-Subunit Protein Anbu. J Mol Biol. 2018 Feb 2. pii: S0022-2836(18)30007-X. doi:, 10.1016/j.jmb.2018.01.004. PMID:29355501 doi:http://dx.doi.org/10.1016/j.jmb.2018.01.004
- ↑ Erdős G, Dosztányi Z. Analyzing Protein Disorder with IUPred2A. Curr Protoc Bioinformatics. 2020 Jun;70(1):e99. PMID:32237272 doi:10.1002/cpbi.99
- ↑ Wang K, Hu G, Basu S, Kurgan L. flDPnn2: Accurate and Fast Predictor of Intrinsic Disorder in Proteins. J Mol Biol. 2024 Sep 1;436(17):168605. PMID:39237195 doi:10.1016/j.jmb.2024.168605
- ↑ There is a spherical blue "empty basket" on the amino terminus because Met1 is missing. This spherical empty basket was removed in the image showing the putative catalytic triad.
- ↑ In 2ade, the temperature of the distal nitrogens in Arg321 is 79, while the average temperature is 31.
- ↑ See Labels on Atoms: S—, X, D, ?.





