Unknown amino acids and nucleic residues

From Proteopedia

Jump to: navigation, search

Atomic coordinate file data entries in the Protein Data Bank (PDB) may include atoms of unknown amino acids, designated "UNK", or unknown nucleic residues, designated "N"[1]. Total such entries in March, 2025 are:

  • UNK: 1,767 entries.
  • N: 123 entries.

While one might expect unknown residues to be more common in entries released in the 20th century, in fact, they are far more common since the 21st century success of Electron cryomicroscopy. Cryo-EM is often used on samples with unknown components, and has a poorer median resolution than does X-ray crystallography:

Method

Total Entries

Median Resolution

UNK Entries

UNK Med. Res.

N Entries

N Med. Res.

X-ray crystallography

192,742

2.0 Å

538 (0.3%)

3.0 Å

34 (0.02%)

2.5 Å

Cryo-EM

25,538

3.3 Å

1,228 (4.8%)

3.7 Å

89 (0.3%)

3.2 Å

Data in the above table are for March, 2025.

Consistent with the recent success of cryo-EM, UNK and N entries have increased recently:

Release Date

1976-1989

1990-1999

2000-2009

2010-2019

2020-2025(March)

UNK Entries

12

27

94

731

903

N Entries

0

1

13

28

81

The average rate of release of UNK-containing entries for 2010-2019 was 6.1 per month; for 2020-2025(March 24), it increased to 14.9 per month, a rate more than twice that of the previous decade. X-ray entries accounted for nearly all of the (few) UNK/N entries through 2009 (131 X-ray, 2 EM). Beginning in 2010-2019, EM entries had more UNK/N cases than did X-ray entries (428 EM vs. 302 X-ray). In 2020-2025, only 12% of UNK/N entries were X-ray entries, the remainder being cryo-EM entries. Solution NMR entries do not have UNK or N.

FirstGlance in Jmol reports UNK and N

FirstGlance in Jmol, when you click Show More Details, alerts you to the presence of UNK or N residues, and reports the counts. Examples for UNK and N are analyzed in detail in the Notes for FirstGlance:

Notes

  1. The PDB data file format, under the heading HET, states that "HET records are used to describe non-standard residues ... that constitute part of a biological polymer and is not one of the following: standard amino acids, standard nucleic acids, or unknown amino acid (UNK) or nucleic acid (N) where UNK and N are used to indicate the unknown residue name." UNK and N have record type ATOM (not HETATM).

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz

Personal tools