Unknown amino acids and nucleic residues
From Proteopedia
Atomic coordinate file data entries in the Protein Data Bank (PDB) may include atoms of unknown amino acids, designated "UNK", or unknown nucleic residues, designated "N". Total such entries in March, 2025 are:
- UNK: 1,767 entries.
- N: 123 entries.
While one might expect unknown residues to be more common in entries released in the 20th century, in fact, they are far more common since the 21st century success of Electron cryomicroscopy. Cryo-EM is often used on samples with unknown components, and has a lower median resolution than does X-ray crystallography:
Method |
Total Entries |
UNK Entries |
N Entries |
Median Resolution |
X-ray crystallography |
192,742 |
538 (0.3%) |
34 (0.02%) |
2.0 Å |
Cryo-EM |
25,538 |
1,228 (4.8%) |
89 (0.3%) |
3.3 Å |
Data in the above table are for March, 2025.
Consistent with the recent success of cryo-EM, UNK and N entries have increased recently:
Release Date |
1976-1989 |
1990-1999 |
2000-2009 |
2010-2019 |
2020-2025(March) |
UNK Entries |
12 |
27 |
94 |
731 |
903 |
N Entries |
0 |
1 |
13 |
28 |
81 |
X-ray entries accounted for nearly all of the (few) UNK/N entries through 2009. Beginning in 2010-2019, EM entries had more UNK/N cases than did X-ray entries. In 2020-2025, only 12% of UNK/N entries were X-ray entries, the remainder being cryo-EM entries. Solution NMR entries do not have UNK or N.