A DNA structural alphabet provides new insight into DNA flexibility
Bohdan Schneider, Paulina Bozikova, Iva Necasova, Petr Cech, Daniel Svozil
and Jiri Cerny [1]
Molecular Tour
DNA is a structurally plastic molecule, and its biological function is enabled by adaptation to its binding partners. To identify the DNA structural polymorphisms that are possible in such adaptations, the dinucleotide structures of 60 000 DNA steps from sequentially nonredundant crystal structures were classified and an automated protocol assigning 44 distinct structural (conformational) classes called NtC (for Nucleotide Conformers) was developed.
To further facilitate understanding of the DNA structure, structurally similar NtC classes were grouped into 11 letters of a DNA structural alphabet CANA (Conformational Alphabet of Nucleic Acids) and the projection of CANA onto the graphical representation of the molecular structure was proposed. The DNA structural alphabet CANA makes the analysis of DNA structure more comprehensible yet does not compromise the impartiality of the structural description and provides a tool to characterize the DNA structure beyond a rough classification into BI-, BII-, A- and Z-DNA types. The NtC classification was further used to define a validation score called confal, which quantifies the conformity between an analyzed structure and the geometries of NtC.
NtC, CANA and confal assignment, which is accessible at the website https://dnatco.org [2], allows the quantitative assessment and validation of DNA structures and their subsequent analysis by means of pseudo-sequence alignment. We believe that the NtC and CANA assignment protocol will contribute to understanding DNA structures by their impartial characterization, help to refine and validate DNA crystal and NMR structures, interpret DNA molecular modelling, and facilitate challenging analyses of sequence-dependent features of DNA structures and their interactions with proteins.
Gallery of the CANA letters. Each letter is represented by 40 randomly selected structures of steps which represent one NtC class belonging to the letter in the golden set. A-like and mixed B/A conformers are drawn in pink and violet, B-like conformers in blue. The CANA letter AAA is represented by NtC AA00, BBB by BB00, and BB2 by BB07.
Annotation of the conformational properties of a few archetypal types of DNA structures revealed some unexpected features. The Dickerson–Drew dodecamer (for example PDB entry 1bna), which is often considered to be a typical B-DNA duplex, is conformationally rich, with a . Our analysis of duplex models based on the fibre-diffraction data discloses the need for their critical evaluation before they are used for computer modelling. Conformational analysis of guanine quadruplexes (for example PDB entry 1jpq) demonstrates the universality of the most frequent B conformer, BB00, which builds the tetrad cores of these folded DNA in combination with . A Holliday junction (for example PDB entry 1dcw) is a DNA intermediate in homologous recombination. It contains four double-stranded DNA arms joined together by short links. The stems of the junctions are . The NtC class that can be associated with the junction proper is the recurrently occurring unstacked NS04.
The ∼21% of steps that are left unassigned in our procedure represent a compromise between the accuracy of the assignment and the complex nature of the DNA conformational space. A small percentage of the currently unassigned steps may subsequently be classified as new NtC classes. These classes may be quite important for understanding the detailed architecture of folded DNA, such as turns in hairpin structures or quadruplexes and still uncharacterized conformer(s) describing the i-motif fold, but they will most likely be numerically small. Undoubtedly, a fair number of the unassigned steps originate from refinement errors, but even error-free structures will have a significant number of uncharacterized conformers because of the high deformability of DNA molecules.