Atomic coordinate file
From Proteopedia
(→Data Formats: PDB, mmCIF, etc. - polishing) |
(→Data Formats: PDB, mmCIF, etc. - adding content) |
||
Line 11: | Line 11: | ||
'''PDB Format.''' The most popular macromolecular data format among crystallographers is the one used by the early (1970's) [[Protein Data Bank]], called the ''Protein Data Bank Format'' or ''PDB Format''. Data files in this format are called ''PDB Files''. Although this format has serious limitations, it remains popular partly because the data files are in plain text, and are relatively easy to read by humans. | '''PDB Format.''' The most popular macromolecular data format among crystallographers is the one used by the early (1970's) [[Protein Data Bank]], called the ''Protein Data Bank Format'' or ''PDB Format''. Data files in this format are called ''PDB Files''. Although this format has serious limitations, it remains popular partly because the data files are in plain text, and are relatively easy to read by humans. | ||
+ | *[http://proteinexplorer.org/gpsi/pdbtext.htm Simple Diagram of PDB ATOM Records in the Format] | ||
*[http://www.wwpdb.org/documentation/format30/index.html Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description] | *[http://www.wwpdb.org/documentation/format30/index.html Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description] | ||
Revision as of 18:49, 14 June 2008
Definition
Atomic coordinate files are the data files that specify three-dimensional (3D) molecular structures. At a minimum, they must specify the positions of each atom in space, typically with X, Y and Z Cartesian coordinates, and the chemical element each atom represents.
Data Formats: PDB, mmCIF, etc.
Atomic coordinate files use many possible data formats. The XYZ format (file type .xyz) is specifies only the coordinates and chemical element for each atom, and is useful for small molecules. This format is not adequate for macromolecules because additional information is needed for their atoms.
Macromolecular atomic coordinate files need to specify quite a bit of information in addition to the position of each atom in space and its chemical element. Each atom either belongs to a Standard Residue or not. If not, it is designated a hetero atom. The position of each atom within a standard residue is specified, e.g. carbon atoms in amino acids can be the carboxy carbon (C), the alpha carbon (CA), the beta carbon (CB), and so forth. Nitrogen atoms can be in the main chain (N), or on the sidechain, e.g. in the terminal zeta position in lysine (NZ). In addition to the name of the residue to which an atom belongs are provided the name of the chain where the residue is found, and its sequence number position. In addition to the X, Y, and Z coordinates are given an occupancy value, and an isotropic B value or temperature value''.
PDB Format. The most popular macromolecular data format among crystallographers is the one used by the early (1970's) Protein Data Bank, called the Protein Data Bank Format or PDB Format. Data files in this format are called PDB Files. Although this format has serious limitations, it remains popular partly because the data files are in plain text, and are relatively easy to read by humans.
- Simple Diagram of PDB ATOM Records in the Format
- Protein Data Bank Contents Guide: Atomic Coordinate Entry Format Description
Bonds: Connectivity
Typically, atomic coordinate files do not specify covalent bonds between atoms. Molecular modeling or visualization software determines the positions of covalent bonds using simple rules. Typically, any two non-hydrogen atoms within 1.9 Ångstroms of each other are deemed to be covalently bonded. (The distance for a bond involving a hydrogen atom is less.) The PDB format requires that covalent bonds be specified between atoms that are not members of Standard Residues in protein or nucleic acid chains. These are specified in CONECT records.