Chains and Chain IDs

From Proteopedia

(Difference between revisions)
Jump to: navigation, search
(Author vs. wwPDB Chain IDs)
Current revision (13:15, 15 July 2025) (edit) (undo)
(Oligosaccharide Chain IDs)
 
(50 intermediate revisions not shown.)
Line 12: Line 12:
Polypeptide ([[protein]]) chains are '''linear''', with rare exceptions where side-chains form [[protein crosslinks]] between two linear chains,
Polypeptide ([[protein]]) chains are '''linear''', with rare exceptions where side-chains form [[protein crosslinks]] between two linear chains,
-
such as [[disulfide bonds]], or less commonly other types [[protein crosslinks]] of such as [[isopeptide bond]]s.
+
such as [[disulfide bonds]], or less commonly other types of [[protein crosslinks]], such as [[isopeptide bond]]s.
Each protein chain has '''two ends''', an amino terminus (positively charged) and a carboxy terminus (negatively charged). The first residue in a protein chain becomes the amino terminus, with new amino acids being added at the carboxy terminus. The sequence of [[amino acids]] is specified by messenger RNA, which is a copy of the sequence of codons in the template strand of the DNA gene. The first residue in a nucleic acid chain becomes the 5' (phosphate) terminus, with new nucleotides being added at the 3' (hydroxy) terminus.
Each protein chain has '''two ends''', an amino terminus (positively charged) and a carboxy terminus (negatively charged). The first residue in a protein chain becomes the amino terminus, with new amino acids being added at the carboxy terminus. The sequence of [[amino acids]] is specified by messenger RNA, which is a copy of the sequence of codons in the template strand of the DNA gene. The first residue in a nucleic acid chain becomes the 5' (phosphate) terminus, with new nucleotides being added at the 3' (hydroxy) terminus.
Line 21: Line 21:
==Chain IDs==
==Chain IDs==
-
In the [[atomic coordinate file]]s maintained by the [[wwPDB]] ([[PDB files]]), each polymer chain is given an ID, or chain "name". In the legacy [[Atomic_coordinate_file#PDB_Data_Format|PDB data format]], chain IDs are a single letter or numeral (A-Z, a-z, 0-9), which limits the number of chains to 62. In the newer [[Atomic_coordinate_file#mmCIF_Data_Format|mmCIF data format]] (also called PDBx), chain IDs can be [https://mmcif.wwpdb.org/docs/large-pdbx-examples/index.html up to 4 letters or numerals], so the number of chains in a single structure is essentially unlimited (>10 million chains/structure would be accommodated by 4-character chain IDs).
+
In the [[atomic coordinate file]]s maintained by the [[wwPDB]] ([[PDB files]]), each polymer chain is given an ID, or chain "name". In the legacy [[Atomic_coordinate_file#PDB_Data_Format|PDB data format]], chain IDs are a single letter or numeral (A-Z, a-z, 0-9), which limits the number of chains to 62. In the newer [[Atomic_coordinate_file#mmCIF_Data_Format|mmCIF data format]] (also called PDBx), chain IDs can be [https://mmcif.wwpdb.org/docs/large-pdbx-examples/index.html up to 4 letters or numerals], so the number of chains in a single structure has no practical limitation (>10 million chains/structure could be accommodated by 4-character chain IDs).
-
===Carbohydrate and Non-Polymer Chain IDs===
+
===Oligosaccharide Chain IDs===
-
An idiosyncracy of PDB files is that not only polymer chains, but all components in the structure model are assigned chain IDs, including carbohydrates (regardless of whether covalently-linked), ligands, metal ions, and water. The assignment of unique names to disaccharides and oligosaccharides began with the [https://www.wwpdb.org/documentation/remediation 2020 wwPDB Remediation of Carbohydrates].
+
The assignment of unique chain IDs to disaccharides and oligosaccharides began with the [https://www.wwpdb.org/documentation/remediation 2020 wwPDB Remediation of Carbohydrates]. Notably, '''monomeric''' nucleotides and amino acids (not part of a polymeric chain) and monosaccharides are assigned the chain ID of the '''nearest protein or nucleic acid''', while '''multimeric''' di- or oligo-nucleotides, and di- or oligosaccharides are given '''unique''' chain IDs. See item 2 below for the special case of dipeptides vs. tri- / oligo-peptides. N-linked glycans are likely underrepresented in the PDB due to microheterogeneity and their flexibility<ref>PMID: 40645091</ref>.
-
Notably, '''monomeric''' nucleotides and amino acids (not part of a polymeric chain) and monosaccharides are assigned the chain ID of the '''nearest protein or nucleic acid''', while '''multimeric''' di- or oligo-nucleotides, and di- or oligosaccharides are given '''unique''' chain IDs. See item 2 below for the special case of dipeptides vs. tri- / oligo-peptides.
+
-
The procedure for assigning chain IDs is specified in the wwPDB Procedures section [https://www.wwpdb.org/documentation/procedure#toc_6 6. Chain ID assignment]. In February, 2025 that document needs two corrections in order to agree with actual wwPDB practice:
+
===Chain ID Assignment Policies===
-
# When protein or nucleic acid is present, ligands and water bound to carbohydrate are never assigned the chain ID of that carbohydrate, but are given the chain ID of the nearest protein/nucleic acid, even when it is >5 &Aring; away (examples: [[7lkc|7LKC]], [[7dc4]], [[8g82]]). When the structure is carbohydrate without any protein or nucleic acid, only then are ligands and water given the chain ID of the nearest carbohydrate (examples: [[1c58]], [[2kqo]]).
+
 
-
# Although dinucleotides and disaccharides are assigned unique chain IDs, dipeptides are not. Rather, dipeptides, traditionally deemed ligands, are assigned the ID of the polymer chain to which they are bound. Trisaccharides and higher oligosaccharides are assigned unique chain IDs.
+
The procedure for assigning chain IDs is specified in the wwPDB Procedures section [https://www.wwpdb.org/documentation/procedure#toc_6 6. Chain ID assignment]. In April, 2025 that document needs two corrections in order to agree with actual wwPDB practice:
 +
# When protein or nucleic acid is present, ligands and water bound to carbohydrate are '''almost never<ref name="cacho">An exception is Ca318 in [[3gzt]], which is assigned chain ID X. Chain X is a disaccharide. In the asymmetric unit, Ca318 is 44 &Aring; from the disaccharide, but only 26 &Aring; from the nearest protein (chain Q). In Biomolecule 4, it is 2.6 &Aring; from sidechain oxygens of Asp231 in chain B, and in the vicinity of two other chain B Asp.</ref> assigned the chain ID of that carbohydrate''', but are given the chain ID of the nearest protein/nucleic acid, even when it is >5 &Aring; away (examples: [[7lkc|7LKC]], [[7dc4]], [[8g82]]). When the structure is carbohydrate without any protein or nucleic acid, only then are ligands and water given the chain ID of the nearest carbohydrate (examples: [[1c58]], [[2kqo]]).
 +
# Although dinucleotides and disaccharides are assigned unique chain IDs, '''dipeptides are not supposed to be assigned unique chain IDs.''' This policy is documented at the wwPDB in [https://www.wwpdb.org/documentation/procedure#toc_3 3. Polymer sequences and sequence database reference assignment]. '''However, this policy has not been followed consistently.''' A search at RCSB for ''Polymer Entity Sequence Length'' = 2 and ''Polymer Entity Type'' is Protein and Return ''Polymer Entities'' finds 59 hits (April, 2025), where dipeptides were given unique author-assigned chain IDS (in both PDB format and mmCIF format files). Traditionally deemed ligands, dipeptides are supposed to be assigned the same chain ID as the polymer chain to which they are bound. Dipeptide examples are [[2cyh]] and [[1dpp]]; tripeptide: [[4q1l|4q1L]]. Tripeptides and longer oligopeptides are supposed to be assigned unique chain IDs. When a dipeptide is not assigned a unique chain ID, it has no SEQRES and cannot be found by a search for polymer length at RCSB (April, 2025).
===Author vs. wwPDB Chain IDs===
===Author vs. wwPDB Chain IDs===
Line 35: Line 36:
CAUTION: Because the definitions of chain attributes are sometimes unclear in the [https://mmcif.wwpdb.org/pdbx-mmcif-search.html mmCIF Dictionaries], assertions in this section are the interpretations of a small sample of mmCIF files by [[User:Eric Martz]], and might contain errors. Please report any concerns or corrections to [[Image:Martz email.png|150px]].
CAUTION: Because the definitions of chain attributes are sometimes unclear in the [https://mmcif.wwpdb.org/pdbx-mmcif-search.html mmCIF Dictionaries], assertions in this section are the interpretations of a small sample of mmCIF files by [[User:Eric Martz]], and might contain errors. Please report any concerns or corrections to [[Image:Martz email.png|150px]].
</td></tr></table>
</td></tr></table>
-
Regardless of the chain IDs assigned by the authors of a structure model entry deposited in the wwPDB, the wwPDB assigns its own (usually distinct) chain IDs. These PDBx chain IDs are present only in the mmCIF files, not in the PDB format files. In mmCIF files, there is no single place that lists all author-assigned or all wwPDB-assigned chain IDs.
+
An idiosyncracy of '''mmCIF''' [[wwPDB]] files is that not only polymer chains of protein, nucleic acid, and carbohydrates, but all components in the structure model are assigned chain IDs, including ligands, metal ions, and water. '''Regardless of the chain IDs assigned by the authors''' of a structure model entry deposited in the wwPDB, the '''wwPDB assigns an additional set''' of (usually distinct) chain IDs. These PDBx chain IDs are present only in the mmCIF files, not in the PDB format files. In mmCIF files, there is no single place that lists all author-assigned or all wwPDB-assigned chain IDs.
* <b>Author</b>-assigned chain IDs:
* <b>Author</b>-assigned chain IDs:
Line 45: Line 46:
** Carbohydrates: _pdbx_branch_scheme<b>.asym_id</b>
** Carbohydrates: _pdbx_branch_scheme<b>.asym_id</b>
** All else: _pdbx_nonpoly_scheme<b>.asym_id</b>
** All else: _pdbx_nonpoly_scheme<b>.asym_id</b>
-
<!--
 
-
*For the meanings of the mmCIF terms ''token, category'', and ''attribute'' see the [https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner%E2%80%99s-guide-to-pdbx-mmcif Beginner’s Guide to PDB Structures and the PDBx/mmCIF Format].
 
-
* Author-assigned chain IDs used in publications are in the
 
-
** attributes '''.auth_asym_id, .pdb_strand_id''' or '''.strand_id''', which are in tokens that include
 
-
** categories '''_pdbx_poly_seq_scheme, _pdbx_nonpoly_scheme''', and '''_atom_site'''.
 
-
mmCIF files do not list all author-assigned chain IDs in any single place.
 
-
* wwPDB-assigned chain IDs are in the
 
-
** attributes '''.asym_id, .label_asym_id''' or '''.pdbx_strand_id''', found in the same categories as listed above.
 
-
** A complete list of wwPDB-assigned chain IDs for the Biomolecule 1 Assembly is the value of the token '''_pdbx_struct_assembly_gen.asym_id_list'''.
 
-
-->
 
Monomeric ligands and other non-polymers that are assigned the chain name of the associated polymer are always assigned the author-assigned polymer IDs, when the model is compatible with the PDB format -- and hence always get single-character names.
Monomeric ligands and other non-polymers that are assigned the chain name of the associated polymer are always assigned the author-assigned polymer IDs, when the model is compatible with the PDB format -- and hence always get single-character names.
Line 60: Line 51:
When the entry is not compatible with PDB format because it has >62 chains, some author-assigned chain names will have at least two characters.
When the entry is not compatible with PDB format because it has >62 chains, some author-assigned chain names will have at least two characters.
-
wwPDB-assigned chain IDs are all '''upper case letters''', and follow this sequence: A-Z, AA-ZA, BA-ZA, ... AZ-ZZ, AAA-ZAA, ABA-ZBA, etc.
+
wwPDB-assigned chain IDs are all '''upper case letters''', and are assigned systematically in this order:
 +
A-Z, AA-ZA, BA-ZA, ... AZ-ZZ, AAA-ZAA, ABA-ZBA, etc.
Single character IDs cover 26 chains. Double letter IDs cover 26<sup>2</sup>=676. Triple-letter IDs cover 26<sup>3</sup>=17,576. Four-letter IDs are also allowed, covering an additional 456,976 IDs.
Single character IDs cover 26 chains. Double letter IDs cover 26<sup>2</sup>=676. Triple-letter IDs cover 26<sup>3</sup>=17,576. Four-letter IDs are also allowed, covering an additional 456,976 IDs.
-
When a biological unit has more chains than the asymmetric unit, the above sequence of wwPDB-assigned chain IDs continues. In the examples below, the ratio of biological unit polymer chains to asymmetric unit polymer chains are:
+
When a biological unit has more chains than the asymmetric unit, the additional wwPDB-assigned chain IDs are added following the above order. In the examples below, the ratio of biological unit polymer chains to asymmetric unit polymer chains are:
* 1hho 2x: 2 -> 4 polymer chains ("polymer chains" means protein or nucleic acid only).
* 1hho 2x: 2 -> 4 polymer chains ("polymer chains" means protein or nucleic acid only).
* 1igt 1x: 4 -> 4 polymer chains.
* 1igt 1x: 4 -> 4 polymer chains.
Line 71: Line 63:
* 7o0u 1x: 86 -> 86 polymer chains.
* 7o0u 1x: 86 -> 86 polymer chains.
-
The numbers of polymer chains in biological units are given in ''_pdbx_struct_assembly.oligomeric_count''. Symmetry operations for generating biological units are given in the category ''_pdbx_struct_oper_list''.
+
The numbers of polymer chains in biological units are given in the token ''_pdbx_struct_assembly.oligomeric_count''. Symmetry operations for generating biological units are given in the category ''_pdbx_struct_oper_list''.
====Examples====
====Examples====
Line 232: Line 224:
</table>
</table>
<font color="magenta">* Not compatible with PDB format</font> because of its 62-chain limit (A-Z, a-z, 0-9). Some author-assigned chain IDs must have at least 2 characters.
<font color="magenta">* Not compatible with PDB format</font> because of its 62-chain limit (A-Z, a-z, 0-9). Some author-assigned chain IDs must have at least 2 characters.
 +
 +
===AlphaFold3 Chain IDs===
 +
The [https://alphafoldserver.com AlphaFold Server], which in 2025 uses AlphaFold3<ref name="af3">PMID: 38718835</ref>, predicts complexes with multiple chains of protein and/or nucleic acid, plus a limited set of ligands, and metal ions, and a wide range of post-translational modifications of amino acids and chemical modifications of nucleotides (see [[How to predict structures with AlphaFold]]). Predicted models are available in '''mmCIF''' format only (not PDB format, although the mmCIF files can be [[Converting AlphaFold3 CIF to PDB|easily converted to PDB format]]). Consistent with the chain ID assignment policies of the wwPDB for mmCIF files, every entity is assigned a unique chain ID, including polymer chains, ligands, metal ions, and glycans including oligo- and monosaccharides. The following three examples are provided by the Server.
 +
 +
<table class="wikitable">
 +
<tr>
 +
<th>
 +
PDB ID
 +
</th>
 +
<th>
 +
Protein
 +
</th>
 +
<th>
 +
Author-assigned chain IDs in PDB & mmCIF files
 +
</th>
 +
<th>
 +
AlphaFold3 Server-assigned chain IDs
 +
</th>
 +
<th>
 +
Notes
 +
</th>
 +
</tr>
 +
<tr><!-- - - - - - - - - -->
 +
<td>
 +
[[7bbv]]
 +
</td>
 +
<td>
 +
Pectate lyase B
 +
<br>
 +
Biological Unit 1
 +
</td>
 +
<td>
 +
1 ID total:
 +
:Protein, Zn++, Monomeric mannose glycoconjugates: <b>A</b>.
 +
</td>
 +
<td>
 +
8 IDs total:
 +
<br>
 +
:Protein: <b>A</b>.
 +
:1 Zn++: <b>B</b>.
 +
:6 Monomeric mannose glycoconjugates: <b>C,D,E,F,G,H</b>.
 +
</td>
 +
<td>
 +
Mannoses are conjugated to 5 threonines and 1 serine.
 +
</td>
 +
</tr>
 +
<tr><!-- - - - - - - - - -->
 +
<td>
 +
[[7rce]]
 +
</td>
 +
<td>
 +
Synthetic constructs
 +
</td>
 +
<td>
 +
3 IDs total:
 +
:Protein, Ca++, Na+: <b>A</b>.
 +
:DNA: <b>B, C</b>.
 +
</td>
 +
<td>
 +
7 IDs total:
 +
<br>
 +
:Protein: <b>A</b>.
 +
:3 Ca++: <b>B, C, D</b>.
 +
:1 Na+: <b>E</b>.
 +
:DNA: <b>F, G</b>.
 +
</td>
 +
<td>
 +
The 2.4 &Aring; X-ray model has 85 amino acid sidechains missing distal atoms, including 50 charged residues, and is completely missing a small loop of 4 residues that includes one positive charge. These missing atoms are all present in the AlphaFold3-predicted model.
 +
 +
</td>
 +
</tr>
 +
<tr><!-- - - - - - - - - -->
 +
<td>
 +
[[8aw3]]
 +
</td>
 +
<td>
 +
tRNA Deaminase
 +
</td>
 +
<td>
 +
3 IDs total:
 +
:tRNA: <b>1</b>.
 +
:Protein + 1 Zn++: <b>2</b>.
 +
:Protein + 1 Zn++: <b>3</b>.
 +
</td>
 +
<td>
 +
5 IDs total:
 +
<br>
 +
:Protein: <b>A, B</b>.
 +
:Zn++: <b>C, D</b>.
 +
:tRNA: <b>E</b>.
 +
</td>
 +
<td>
 +
 +
</td>
 +
</tr>
 +
</table>
 +
 +
==Notes==
 +
<references />
==See Also==
==See Also==

Current revision

Contents

Chain Biochemistry

The term chain, in biochemistry, usually denotes either a polypeptide chain or a polynucleotide chain.

  • Protein Chains: A polypeptide chain is a sequence of amino acids covalently linked by peptide bonds. When longer than 50 amino acids, it is called a Protein, whereas a short polypeptide consisting of 50 or fewer amino acids is termed a peptide. The chain structures of proteins are most easily visualized with backbone representations.
  • Nucleic Acid Chains: A polynucleotide chain is a sequence of nucleotides covalently linked by ribose (or deoxyribose)-phosphodiester bonds, e.g. either DNA or RNA.

Polypeptide (protein) chains are linear, with rare exceptions where side-chains form protein crosslinks between two linear chains, such as disulfide bonds, or less commonly other types of protein crosslinks, such as isopeptide bonds.

Each protein chain has two ends, an amino terminus (positively charged) and a carboxy terminus (negatively charged). The first residue in a protein chain becomes the amino terminus, with new amino acids being added at the carboxy terminus. The sequence of amino acids is specified by messenger RNA, which is a copy of the sequence of codons in the template strand of the DNA gene. The first residue in a nucleic acid chain becomes the 5' (phosphate) terminus, with new nucleotides being added at the 3' (hydroxy) terminus.

Protein molecules may consist of one or more polypeptide chains (see Protein primary, secondary, tertiary and quaternary structure). Those with more than one chain may be termed homo-oligomers or hetero-oligomers, homo-multimers or hetero-multimers. Functional forms of the molecule, termed biological units, often contain a different number of chains than does the crystallographic asymmetric unit. Examples are given in the article on biological units.

In a protein molecule consisting of multiple chains, the chains are usually held together by non-covalent bonds, but sometimes by covalent bonds, usually disulfide bonds. See quaternary structure.

Chain IDs

In the atomic coordinate files maintained by the wwPDB (PDB files), each polymer chain is given an ID, or chain "name". In the legacy PDB data format, chain IDs are a single letter or numeral (A-Z, a-z, 0-9), which limits the number of chains to 62. In the newer mmCIF data format (also called PDBx), chain IDs can be up to 4 letters or numerals, so the number of chains in a single structure has no practical limitation (>10 million chains/structure could be accommodated by 4-character chain IDs).

Oligosaccharide Chain IDs

The assignment of unique chain IDs to disaccharides and oligosaccharides began with the 2020 wwPDB Remediation of Carbohydrates. Notably, monomeric nucleotides and amino acids (not part of a polymeric chain) and monosaccharides are assigned the chain ID of the nearest protein or nucleic acid, while multimeric di- or oligo-nucleotides, and di- or oligosaccharides are given unique chain IDs. See item 2 below for the special case of dipeptides vs. tri- / oligo-peptides. N-linked glycans are likely underrepresented in the PDB due to microheterogeneity and their flexibility[1].

Chain ID Assignment Policies

The procedure for assigning chain IDs is specified in the wwPDB Procedures section 6. Chain ID assignment. In April, 2025 that document needs two corrections in order to agree with actual wwPDB practice:

  1. When protein or nucleic acid is present, ligands and water bound to carbohydrate are almost never[2] assigned the chain ID of that carbohydrate, but are given the chain ID of the nearest protein/nucleic acid, even when it is >5 Å away (examples: 7LKC, 7dc4, 8g82). When the structure is carbohydrate without any protein or nucleic acid, only then are ligands and water given the chain ID of the nearest carbohydrate (examples: 1c58, 2kqo).
  2. Although dinucleotides and disaccharides are assigned unique chain IDs, dipeptides are not supposed to be assigned unique chain IDs. This policy is documented at the wwPDB in 3. Polymer sequences and sequence database reference assignment. However, this policy has not been followed consistently. A search at RCSB for Polymer Entity Sequence Length = 2 and Polymer Entity Type is Protein and Return Polymer Entities finds 59 hits (April, 2025), where dipeptides were given unique author-assigned chain IDS (in both PDB format and mmCIF format files). Traditionally deemed ligands, dipeptides are supposed to be assigned the same chain ID as the polymer chain to which they are bound. Dipeptide examples are 2cyh and 1dpp; tripeptide: 4q1L. Tripeptides and longer oligopeptides are supposed to be assigned unique chain IDs. When a dipeptide is not assigned a unique chain ID, it has no SEQRES and cannot be found by a search for polymer length at RCSB (April, 2025).

Author vs. wwPDB Chain IDs

CAUTION: Because the definitions of chain attributes are sometimes unclear in the mmCIF Dictionaries, assertions in this section are the interpretations of a small sample of mmCIF files by User:Eric Martz, and might contain errors. Please report any concerns or corrections to .

An idiosyncracy of mmCIF wwPDB files is that not only polymer chains of protein, nucleic acid, and carbohydrates, but all components in the structure model are assigned chain IDs, including ligands, metal ions, and water. Regardless of the chain IDs assigned by the authors of a structure model entry deposited in the wwPDB, the wwPDB assigns an additional set of (usually distinct) chain IDs. These PDBx chain IDs are present only in the mmCIF files, not in the PDB format files. In mmCIF files, there is no single place that lists all author-assigned or all wwPDB-assigned chain IDs.

  • Author-assigned chain IDs:
    • Protein & Nucleic Acids: _pdbx_poly_seq_scheme.pdb_strand_id
    • Carbohydrates: _pdbx_branch_scheme.pdb_asym_id (supercedes .auth_asym_id after 2020 remediation)
    • All else: _pdbx_nonpoly_scheme.pdb_strand_id
  • wwPDB-assigned chain IDs:
    • Protein & Nucleic Acids: _pdbx_poly_seq_scheme.asym_id
    • Carbohydrates: _pdbx_branch_scheme.asym_id
    • All else: _pdbx_nonpoly_scheme.asym_id

Monomeric ligands and other non-polymers that are assigned the chain name of the associated polymer are always assigned the author-assigned polymer IDs, when the model is compatible with the PDB format -- and hence always get single-character names.

When the entry is not compatible with PDB format because it has >62 chains, some author-assigned chain names will have at least two characters.

wwPDB-assigned chain IDs are all upper case letters, and are assigned systematically in this order: A-Z, AA-ZA, BA-ZA, ... AZ-ZZ, AAA-ZAA, ABA-ZBA, etc. Single character IDs cover 26 chains. Double letter IDs cover 262=676. Triple-letter IDs cover 263=17,576. Four-letter IDs are also allowed, covering an additional 456,976 IDs.

When a biological unit has more chains than the asymmetric unit, the additional wwPDB-assigned chain IDs are added following the above order. In the examples below, the ratio of biological unit polymer chains to asymmetric unit polymer chains are:

  • 1hho 2x: 2 -> 4 polymer chains ("polymer chains" means protein or nucleic acid only).
  • 1igt 1x: 4 -> 4 polymer chains.
  • 4nia 4x: 60 -> 240 polymer chains.
  • 8qqj 1x: 31 -> 31 polymer chains.
  • 8qhu 1x: 86 -> 86 polymer chains.
  • 7o0u 1x: 86 -> 86 polymer chains.

The numbers of polymer chains in biological units are given in the token _pdbx_struct_assembly.oligomeric_count. Symmetry operations for generating biological units are given in the category _pdbx_struct_oper_list.

Examples

These examples show that it is much easier to work with author-assigned chain IDs than with wwPDB-assigned chain IDs, despite the latter being systematically assigned. Luckily, Jmol & JSmol, iCn3D, and ChimeraX use author-assigned chain IDs. PyMOL selects single-character author-assigned chain IDs, but reports both types of chain IDs. Molstar reports both chain IDs.

PDB ID

Protein, Method & Resolution, Å

Author-assigned unique chain IDs

wwPDB-assigned unique chain IDs

Notes

1hho

Hemoglobin, X-ray 2.1

2 IDs total:
A,B

8 IDs total:
Protein A,B.
Other: C,D,E,F,G (PO4, HEM, OXY) and H (HOH).

1igt

Antibody, X-ray 2.8

6 IDs total:
Protein: A,B,C,D.
Carbohydrate: E,F.

Same

4nia

Virus capsid, X-ray 1.8

60 IDs total:
15 Protein: A-O.
45 RNA: 1-8, P-Z, a-z.

117 IDs total:
15 Protein: A,E,I,M,Q,U,Y,CA,GA,KA,OA,SA,WA,AB,EB.
45 RNA: B-D,F-H,J-L,N-P,R-T,V-X,Z,
    AA,BA,DA-FA,HA-JA,LA-NA,PA-RA,TA-VA,XA,YA,AB-DB,FB-HB.
7 Other (Mg, PO4, SO4): IB-KB, LB-OB.
50 Water: PB-ZB, AC-ZC, AD-MD.

2,160 amino acids.
330 ribonucleotides.
3,381 waters.

8qqj

Type IV pilus, EM 2.6

62 IDs total:
31 Protein: A-Z, a-e.
31 Carbohydrate: 0-9, f-z.

62 IDs total:
31 Protein: A-Z, AA-EA.
31 Carbohydrate: FA-ZA, AB-JB.

3,441 amino acids.
31 serine-linked tetrasaccharides.

8qhu

Ribosome, EM 2.7

86 IDs total*:
76 Protein: A-Z, SA-SZ, Sa-Sh, a-p.
10 RNA: 1-8, S1, S4.
29 Other: none unique.

626 IDs total:
76 Protein: A-Z, AA-GA, JA, NA-ZA, AB-ZB, AC-CC.
10 RNA: HA-IA, KA-MA, DC-HC.
530 K, Na, Mg: IC-ZC, AD-ZD, AE-ZE, ... AQ-ZQ, AR-IR.
10 Water: JR-ZR, AS.

11,268 amino acids.
5,585 ribonucleotides.
367 waters.

7o0u

Double ring photosystem, EM 2.4

88 IDs total*:
86 Protein: AA-AX, BA-BX, C,C1,H1,H2,L,M, aa-ap, ba-bp.
2 Carbohydrate: CG, MG.
Non-polymer (0V9, BCL, BPH, CRT, HEC, LMT, MQ8, V79): none unique.
Water: none unique.

492 IDs total:
86 Protein: A-Z, AA-ZA, AB-ZB, AC-HC.
2 Carbohydrate: IC, JC.
342 Non-polymer: KC-ZC, AD-ZD, AE-ZE, ... AO-NO.
62 Water: OO-ZO, AP-ZP, AQ-XQ.

4,972 amino acids.
4 monosaccharides.
429 non-polymer groups.
491 waters.

* Not compatible with PDB format because of its 62-chain limit (A-Z, a-z, 0-9). Some author-assigned chain IDs must have at least 2 characters.

AlphaFold3 Chain IDs

The AlphaFold Server, which in 2025 uses AlphaFold3[3], predicts complexes with multiple chains of protein and/or nucleic acid, plus a limited set of ligands, and metal ions, and a wide range of post-translational modifications of amino acids and chemical modifications of nucleotides (see How to predict structures with AlphaFold). Predicted models are available in mmCIF format only (not PDB format, although the mmCIF files can be easily converted to PDB format). Consistent with the chain ID assignment policies of the wwPDB for mmCIF files, every entity is assigned a unique chain ID, including polymer chains, ligands, metal ions, and glycans including oligo- and monosaccharides. The following three examples are provided by the Server.

PDB ID

Protein

Author-assigned chain IDs in PDB & mmCIF files

AlphaFold3 Server-assigned chain IDs

Notes

7bbv

Pectate lyase B
Biological Unit 1

1 ID total:

Protein, Zn++, Monomeric mannose glycoconjugates: A.

8 IDs total:

Protein: A.
1 Zn++: B.
6 Monomeric mannose glycoconjugates: C,D,E,F,G,H.

Mannoses are conjugated to 5 threonines and 1 serine.

7rce

Synthetic constructs

3 IDs total:

Protein, Ca++, Na+: A.
DNA: B, C.

7 IDs total:

Protein: A.
3 Ca++: B, C, D.
1 Na+: E.
DNA: F, G.

The 2.4 Å X-ray model has 85 amino acid sidechains missing distal atoms, including 50 charged residues, and is completely missing a small loop of 4 residues that includes one positive charge. These missing atoms are all present in the AlphaFold3-predicted model.

8aw3

tRNA Deaminase

3 IDs total:

tRNA: 1.
Protein + 1 Zn++: 2.
Protein + 1 Zn++: 3.

5 IDs total:

Protein: A, B.
Zn++: C, D.
tRNA: E.

Notes

  1. Gazaway E, Kandel R, Grant OC, Woods RJ. Are N-linked glycans intrinsically disordered? Curr Opin Struct Biol. 2025 Jul 10;93:103118. PMID:40645091 doi:10.1016/j.sbi.2025.103118
  2. An exception is Ca318 in 3gzt, which is assigned chain ID X. Chain X is a disaccharide. In the asymmetric unit, Ca318 is 44 Å from the disaccharide, but only 26 Å from the nearest protein (chain Q). In Biomolecule 4, it is 2.6 Å from sidechain oxygens of Asp231 in chain B, and in the vicinity of two other chain B Asp.
  3. Abramson J, Adler J, Dunger J, Evans R, Green T, Pritzel A, Ronneberger O, Willmore L, Ballard AJ, Bambrick J, Bodenstein SW, Evans DA, Hung CC, O'Neill M, Reiman D, Tunyasuvunakool K, Wu Z, Žemgulytė A, Arvaniti E, Beattie C, Bertolli O, Bridgland A, Cherepanov A, Congreve M, Cowen-Rivers AI, Cowie A, Figurnov M, Fuchs FB, Gladman H, Jain R, Khan YA, Low CMR, Perlin K, Potapenko A, Savy P, Singh S, Stecula A, Thillaisundaram A, Tong C, Yakneen S, Zhong ED, Zielinski M, Žídek A, Bapst V, Kohli P, Jaderberg M, Hassabis D, Jumper JM. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature. 2024 Jun;630(8016):493-500. PMID:38718835 doi:10.1038/s41586-024-07487-w

See Also

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz

Personal tools