Renumbering PDB files

From Proteopedia

(Difference between revisions)
Jump to: navigation, search
m (add article and script use)
Line 6: Line 6:
==PDBrenum==
==PDBrenum==
-
[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers atomic coordinate files to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files.
+
[http://dunbrack3.fccc.edu/PDBrenum/ PDBrenum] is a server that renumbers atomic coordinate files to match the numberings in the corresponding [http://uniprot.org UniProt] entries. PDBrenum will process both [[PDB file format]] and [[atomic coordinate file|mmCIF file format]] atomic coordinate files.<br/>
 +
There is a scientific article describing PDBrenum<ref>PMID: 34228733</ref> that shows it can be run as a Python script as well.
In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match.
In the example of [[6ef8]] vs. [[6nef]], after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match.
Line 26: Line 27:
#At the bottom, click the green '''Run''' button.
#At the bottom, click the green '''Run''' button.
The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a [[Help:Plain text editors|plain text editor]].
The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a [[Help:Plain text editors|plain text editor]].
 +
 +
 +
==References==
 +
<references />

Revision as of 17:34, 7 July 2021

Chemical groups (residues) in atomic coordinate files (PDB files) are numbered. For polymers (protein, DNA, RNA), the amino acid and nucleotide groups are given sequence numbers. For non-polymer groups (hetero groups in PDB terminology), the numbers are arbitrary, but ideally do not overlap with the polymer sequence numbers. The wwPDB allows arbitrary numbering of polymer sequences. See examples at Unusual sequence numbering. Discrepancies in numbering are confusing and frustrating when comparing structures of similar macromolecules.

One of many examples is comparison of the structures of a bacterial cytochrome, OmcS. 6ef8 and 6nef are cryo-EM structures of the same cytochrome, mature length 407 amino acids (after removal of the N-terminal signal peptide, length 25 amino acids). 6ef8 is numbered 1-407, while the same residues in 6nef are numbered 26-432.

Below are servers that will re-number atomic coordinate files.

Contents

PDBrenum

PDBrenum is a server that renumbers atomic coordinate files to match the numberings in the corresponding UniProt entries. PDBrenum will process both PDB file format and mmCIF file format atomic coordinate files.
There is a scientific article describing PDBrenum[1] that shows it can be run as a Python script as well.

In the example of 6ef8 vs. 6nef, after processing by PDBrenum, the cytochromes in both files have sequence numbers 26-432, which is very helpful. Unfortunately, the authors listed the hemes (HEC) in different orders in the text of the PDB files, so their numbers still don't match.

PDB Tools Web

PDB Tools Web is a server that has many options for modifying PDB files. The complete list of operations is explained in the Manual. Operations can be applied to selected subsets of residues. Operations can be chained into sequential "pipelines". Operations affecting group numbering:

  • pdb_reres: "Renumbers the residues of the PDB file starting from a given number (default 1)."
  • pdb_shiftres: "Renumbers the residues of the PDB file by adding/subtracting a given number from the original numbering."
  • pdb_gap: "Detects gaps between consecutive residues in the sequence, both by a distance criterion or discontinuous residue numbering. Only applies for protein residues."
  • pdb_delinsertion: "Deletes insertion codes in a PDB file, shifting the residue numbering of downstream residues. Allows for picking specific residues too." (For examples with insertion codes, see Unusual sequence numbering.)

Example

6ef8, a cytochrome polymer, has 7 chains of protein with hemes (HEC). Each protein chain is numbered 1-407. To re-number the protein 26-432 (as UniProt does), we need to add 25 to each number. Here is one set of steps:

  1. Specify 6ef8, and press the Fetch button.
  2. At the "Main" menu, select pdb_selchain. Press the + button to add this operation to the pipeline.
  3. Type A in the chain ID slot.
  4. At the "Main" menu, select pdb_shiftres. Press the + button to add this operation to the pipeline.
  5. Type 25 in the shift slot.
  6. At the bottom, check Tidy.
  7. At the bottom, click the green Run button.

The output is a PDB file containing only chain A, renumbered 26-432. The HEC groups are also renumbered. To avoid renumbering those, you would have to delete them and then cut/paste from the original PDB file using a plain text editor.


References

  1. Faezov B, Dunbrack RL Jr. PDBrenum: A webserver and program providing Protein Data Bank files renumbered according to their UniProt sequences. PLoS One. 2021 Jul 6;16(7):e0253411. doi: 10.1371/journal.pone.0253411., eCollection 2021. PMID:34228733 doi:http://dx.doi.org/10.1371/journal.pone.0253411

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Wayne Decatur

Personal tools