Chains and Chain IDs

From Proteopedia

(Difference between revisions)
Jump to: navigation, search
(Non-Polymer Chain IDs)
(Non-Polymer Chain IDs)
Line 27: Line 27:
# Although dinucleotides and disaccharides are assigned unique chain IDs, dipeptides are not. Rather, dipeptides, traditionally deemed ligands, are assigned the ID of the polymer chain to which they are bound. Trisaccharides and higher oligosaccharides are assigned unique chain IDs.
# Although dinucleotides and disaccharides are assigned unique chain IDs, dipeptides are not. Rather, dipeptides, traditionally deemed ligands, are assigned the ID of the polymer chain to which they are bound. Trisaccharides and higher oligosaccharides are assigned unique chain IDs.
-
The assignment of unique names to disaccharides and oligosaccharides began with the [https://www.wwpdb.org/documentation/remediation 2020 wwPDB Remediation of Carbohydrates]].
+
The assignment of unique names to disaccharides and oligosaccharides began with the [https://www.wwpdb.org/documentation/remediation 2020 wwPDB Remediation of Carbohydrates].
===Author vs. wwPDB Chain IDs===
===Author vs. wwPDB Chain IDs===
Regardless of the chain IDs assigned by the authors of a structure model entry deposited in the wwPDB, the wwPDB assigns its own (usually distinct) chain IDs.
Regardless of the chain IDs assigned by the authors of a structure model entry deposited in the wwPDB, the wwPDB assigns its own (usually distinct) chain IDs.

Revision as of 22:04, 14 February 2025

The term chain, in biochemistry, usually denotes either a polypeptide chain or a polynucleotide chain.

  • Protein Chains: A polypeptide chain is a sequence of amino acids covalently linked by peptide bonds. When longer than 50 amino acids, it is called a Protein, whereas a short polypeptide consisting of 50 or fewer amino acids is termed a peptide. The chain structures of proteins are most easily visualized with backbone representations.
  • Nucleic Acid Chains: A polynucleotide chain is a sequence of nucleotides covalently linked by ribose (or deoxyribose)-phosphodiester bonds, e.g. either DNA or RNA.

Polypeptide (protein) chains are linear, with rare exceptions where side-chains form protein crosslinks between two linear chains, such as disulfide bonds, or less commonly other types protein crosslinks of such as isopeptide bonds.

Each protein chain has two ends, an amino terminus (positively charged) and a carboxy terminus (negatively charged). The first residue in a protein chain becomes the amino terminus, with new amino acids being added at the carboxy terminus. The sequence of amino acids is specified by messenger RNA, which is a copy of the sequence of codons in the template strand of the DNA gene. The first residue in a nucleic acid chain becomes the 5' (phosphate) terminus, with new nucleotides being added at the 3' (hydroxy) terminus.

Protein molecules may consist of one or more polypeptide chains (see Protein primary, secondary, tertiary and quaternary structure. Those with more than one chain may be termed homo-oligomers or hetero-oligomers, homo-multimers or hetero-multimers. The functional form of the molecule, termed the biological unit, often contains a different number of chains than does the crystallographic asymmetric unit. Examples are given in the article on biological units.

In a protein molecule consisting of multiple chains, the chains are usually held together by non-covalent bonds, but sometimes by covalent bonds, usually disulfide bonds. See quaternary structure.

Chain IDs

In the atomic coordinate files maintained by the wwPDB (PDB files), each polymer chain is given an ID, or chain "name". In the legacy PDB data format, chain IDs are a single letter or numeral (A-Z, a-z, 0-9), which limits the number of chains to 62. In the newer mmCIF data format (also called PDBx), chain IDs can be multiple letters or numbers, and the number of chains is unlimited.

Non-Polymer Chain IDs

An idiosyncracy of PDB files is that all components in the structure model are assigned chain IDs, including ligands, metal ions, and water. The procedure for assigning chain IDs is specified in the wwPDB Procedures section 6. Chain ID assignment. In February, 2025 that document needs two corrections in order to agree with actual wwPDB practice:

  1. When protein or nucleic acid is present, ligands and water bound to carbohydrate are never assigned the chain ID of that carbohydrate, but are given the chain ID of the nearest protein/nucleic acid, even when it is >5 Å away (examples: 7LKC, 7dc4, 8g82). When the structure is carbohydrate without any protein or nucleic acid, only then are ligands and water given the chain ID of the nearest carbohydrate (examples: 1c58, 2kqo).
  2. Although dinucleotides and disaccharides are assigned unique chain IDs, dipeptides are not. Rather, dipeptides, traditionally deemed ligands, are assigned the ID of the polymer chain to which they are bound. Trisaccharides and higher oligosaccharides are assigned unique chain IDs.

The assignment of unique names to disaccharides and oligosaccharides began with the 2020 wwPDB Remediation of Carbohydrates.

Author vs. wwPDB Chain IDs

Regardless of the chain IDs assigned by the authors of a structure model entry deposited in the wwPDB, the wwPDB assigns its own (usually distinct) chain IDs.

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz

Personal tools