Renumbering PDB files
From Proteopedia
Chemical groups (residues) in atomic coordinate files (PDB files) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given sequence numbers. For non-polymer groups (hetero groups in PDB terminology), the numbers are arbitrary. The wwPDB allows arbitrary numbering of polymer sequences. See examples at Unusual sequence numbering. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules.
One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. 6ef8 and 6nef are cryo-EM structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432.
Below are servers that will re-number atomic coordinate files.
PDBrenum
PDBrenum is a server that renumbers atomic coordinate files to match the numberings in the corresponding UniProt entries. PDBrenum will process both PDB file format and mmCIF file format atomic coordinate files.
In the example of 6ef8 vs. 6nef, after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match.
PDB Tools Web
PDB Tools Web is a server that has many options for modifying PDB files. The complete list of operations is explained in the Manual. Operations can be applied to selected subsets of residues. Operations can be chained into sequential "pipelines". Operations affecting group numbering:
- pdb_reres: "Renumbers the residues of the PDB file starting from a given number (default 1)."
- pdb_shiftres: "Renumbers the residues of the PDB file by adding/subtracting a given number from the original numbering."
- pdb_gap: "Detects gaps between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues."
- pdb_delinsertion: "Deletes insertion codes in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see Unusual sequence numbering.)
Example
6ef8, a cytochrome polymer, has 7 chains of protein with hemes (HEC). Each protein chain is numbered 1-407. To re-number the protein 26-432 (as UniProt does), we need to add 25 to each number. Here is one set of steps:
- Specify 6ef8, and press the Fetch button.
- At the "Main" menu, select pdb_selchain. Press the + button to add this operation to the pipeline.
- Type A in the chain ID slot.
- At the "Main" menu, select pdb_shiftres. Press the + button to add this operation to the pipeline.
- Type 25 in the shift slot.
- At the bottom, check Tidy.
- At the bottom, click the green Run button.
The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a plain text editor.
