Unknown amino acids and nucleic residues
From Proteopedia
Atomic coordinate file data entries in the Protein Data Bank (PDB) may include atoms of unknown amino acids, designated "UNK", or unknown nucleic residues, designated "N"[1]. Total such entries in March, 2025 are:
- UNK: 1,767 entries.
- N: 123 entries.
While one might expect unknown residues to be more common in entries released in the 20th century, in fact, they are far more common since the 21st century success of Electron cryomicroscopy. Cryo-EM is often used on samples with unknown components, and has a poorer median resolution than does X-ray crystallography:
Method |
Total Entries |
Median Resolution |
UNK Entries |
UNK Med. Res. |
N Entries |
N Med. Res. |
X-ray crystallography |
192,742 |
2.0 Å |
538 (0.3%) |
3.0 Å |
34 (0.02%) |
2.5 Å |
Cryo-EM |
25,538 |
3.3 Å |
1,228 (4.8%) |
3.7 Å |
89 (0.3%) |
3.2 Å |
Data in the above table are for March, 2025.
Consistent with the recent success of cryo-EM, UNK and N entries have increased recently:
Release Date |
1976-1989 |
1990-1999 |
2000-2009 |
2010-2019 |
2020-2025(March) |
UNK Entries |
12 |
27 |
94 |
731 |
903 |
N Entries |
0 |
1 |
13 |
28 |
81 |
The average rate of release of UNK-containing entries for 2010-2019 was 6.1 per month; for 2020-2025(March 24), it increased to 14.9 per month, a rate more than twice that of the previous decade. X-ray entries accounted for nearly all of the (few) UNK/N entries through 2009 (131 X-ray, 2 EM). Beginning in 2010-2019, EM entries had more UNK/N cases than did X-ray entries (428 EM vs. 302 X-ray). In 2020-2025, only 12% of UNK/N entries were X-ray entries, the remainder being cryo-EM entries. Solution NMR entries do not have UNK or N.
FirstGlance in Jmol reports UNK and N
FirstGlance in Jmol, when you click Show More Details, alerts you to the presence of UNK or N residues, and reports the counts. Examples for UNK and N are analyzed in detail in the Notes for FirstGlance:
- There are 974 UNK amino acids in 1fka. Most are modeled as alpha carbons alone, while 174 are modeled as alanine, and 5 are modeled as main-chain atoms only. 1FKA is analyzed under Counting amino acids.
- There are 53 N nucleotides in 7oqe modeled as phosphoriboses (without bases). It and several other examples are analyzed under Counting nucleotides.
Notes
- ↑ The PDB data file format, under the heading HET, states that "HET records are used to describe non-standard residues ... that constitute part of a biological polymer and is not one of the following: standard amino acids, standard nucleic acids, or unknown amino acid (UNK) or nucleic acid (N) where UNK and N are used to indicate the unknown residue name." UNK and N have record type ATOM (not HETATM).