Unknown amino acids and nucleic residues
From Proteopedia
Atomic coordinate file data entries in the Protein Data Bank (PDB) may include atoms of unknown amino acids, designated "UNK", or unknown nucleic residues, designated "N"[1]. Total such entries in March, 2025 are:
- UNK: 1,767 entries.
- N: 123 entries.
While one might expect unknown residues to be more common in entries released in the 20th century, in fact, they are far more common since the 21st century success of Electron cryomicroscopy. Cryo-EM is often used on samples with unknown components, and has a poorer median resolution than does X-ray crystallography:
Method |
Total Entries |
Median Resolution |
UNK Entries |
N Entries |
X-ray crystallography |
192,742 |
2.0 Å |
538 (0.3%) |
34 (0.02%) |
Cryo-EM |
25,538 |
3.3 Å |
1,228 (4.8%) |
89 (0.3%) |
Data in the above table are for March, 2025.
Consistent with the recent success of cryo-EM, UNK and N entries have increased recently:
Release Date |
1976-1989 |
1990-1999 |
2000-2009 |
2010-2019 |
2020-2025(March) |
UNK Entries |
12 |
27 |
94 |
731 |
903 |
N Entries |
0 |
1 |
13 |
28 |
81 |
X-ray entries accounted for nearly all of the (few) UNK/N entries through 2009. Beginning in 2010-2019, EM entries had more UNK/N cases than did X-ray entries. In 2020-2025, only 12% of UNK/N entries were X-ray entries, the remainder being cryo-EM entries. Solution NMR entries do not have UNK or N.
FirstGlance in Jmol reports UNK and N
FirstGlance in Jmol, when you click Show More Details, alerts you to the presence of UNK or N, and reports the counts. Examples for UNK and N are analyzed in detail in the Notes for FirstGlance:
- UNK in 1fka is analyzed under Counting amino acids.
- N in 7oqe and several other examples is analyzed under Counting nucleotides.
Notes
- ↑ The PDB data file format, under the heading HET, states that "HET records are used to describe non-standard residues ... that constitute part of a biological polymer and is not one of the following: standard amino acids, standard nucleic acids, or unknown amino acid (UNK) or nucleic acid (N) where UNK and N are used to indicate the unknown residue name." UNK and N have record type ATOM (not HETATM).