Structure of E. coli DnaC helicase loader

From Proteopedia

(Difference between revisions)
Jump to: navigation, search
(Confirmation of Homology Model By Related Structures)
Current revision (18:24, 2 June 2025) (edit) (undo)
 
(15 intermediate revisions not shown.)
Line 1: Line 1:
 +
<table style="background-color:#ffe0e0"><tr><td>
 +
Since this homology model analysis was done in 2008-2012, this structure has been solved by [[cryo-EM]] (see [[6qem]]). Nevertheless, the story below remains a testament to the effort involved in homology modeling before the structure was solved, and before it was easy to make a reliable prediction with [[AlphaFold]].
 +
</td></tr></table>
{{Theoretical_model}}
{{Theoretical_model}}
 +
<StructureSection load='Dnac_from_2ggz_a.pdb' size='400' side='right' scene='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8' caption=''>
 +
==Overview==
==Overview==
===Introduction===
===Introduction===
Line 7: Line 12:
===3D Structure: Homology Model===
===3D Structure: Homology Model===
-
No empirical (X-ray crystallographic) 3D structure for the ''[http://microbewiki.kenyon.edu/index.php/Escherichia_coli E. coli]'' DnaC protein ([http://www.uniprot.org/uniprot/P0AEF0 UniProt P0AEF0]) is available in November, 2012, although one or more [[#Crystal Structure of DnaC Is "In The Pipeline"|might become available]]. In view of this, [[Homology modeling|homology models]] were constructed using the automated Swiss-Model server<ref name="methods">A model was created in 2008 by Swiss-Model using its totally automated ''first approach'' mode with template [[2qgz]]. In 2012, Swiss-Model's automated mode chose a different template, [[3ecc]], and created a similar model. </ref><ref name="swissmodel">Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/2/195 Free full text]. Server: [http://swissmodel.expasy.org swissmodel.expasy.org]</ref>. In 2008 (when this article was largely written and the molecular scenes were prepared), Swiss-Model deemed the only usable template<ref name="3ec2_notemplate">In December, 2008, Swiss-Model deemed the sequence alignment of ''E. coli'' DnaC with ''A. aeolicus'' DnaC to be too unreliable to permit using the [[3ec2]] structure of the latter as a template for homology modeling of <i>E. coli</i> DnaC.</ref> for the homology model to be the crystal structure of a "putative primosome component" from ''[http://microbewiki.kenyon.edu/index.php/Streptococcus_pyogenes Streptococcus pyogenes]'' ([[2qgz]]) determined by the Northeast Structural Genomics Consortium, "to be published". In 2012, after some changes to the Swiss-Model server, it chose a different template, producing a very similar homology model. This second template was a crystal structure of the DnaC helicase loader of ''[http://microbewiki.kenyon.edu/index.php/Aquifex_aeolicus Aquafex aeolicus]'' ([[3ecc]])<ref name="3ec2_notemplate" />. The agreement between the models built upon two templates, which have only 27% sequence identity with each other, gives confidence that fold and topology of the models are likely to be correct. Furthermore, the two homology models had identical registrations of sequence with structure (data not shown). Nevertheless, because the sequence identity between the templates and the target <i>E. coli</i> DnaC is only ~20%, there may be some error in the registration of the <i>E. coli</i> DnaC sequence with the model structure. Further, the positions of sidechains in homology models are generally unreliable.
+
No empirical (X-ray crystallographic) 3D structure for the ''[http://microbewiki.kenyon.edu/index.php/Escherichia_coli E. coli]'' DnaC protein ([http://www.uniprot.org/uniprot/P0AEF0 UniProt P0AEF0]) is available in November, 2012, although one or more [[#Crystal Structure of DnaC Is "In The Pipeline"|might become available]]. In view of this, [[Homology modeling|homology models]] were constructed using the automated Swiss-Model server<ref name="methods">A model was created in 2008 by Swiss-Model using its totally automated ''first approach'' mode with template [[2qgz]]. In 2012, Swiss-Model's automated mode chose a different template, [[3ecc]], and created a similar model. </ref><ref name="swissmodel">Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/2/195 Free full text]. Server: [http://swissmodel.expasy.org swissmodel.expasy.org]</ref>. In 2008 (when this article was largely written and the molecular scenes were prepared), Swiss-Model deemed the only usable template<ref name="3ec2_notemplate">In December, 2008, Swiss-Model deemed the sequence alignment of ''E. coli'' DnaC with ''A. aeolicus'' DnaC to be too unreliable to permit using the [[3ec2]] structure of the latter as a template for homology modeling of <i>E. coli</i> DnaC.</ref> for the homology model to be the crystal structure of a "putative primosome component" from ''[http://microbewiki.kenyon.edu/index.php/Streptococcus_pyogenes Streptococcus pyogenes]'' ([[2qgz]]) determined by the Northeast Structural Genomics Consortium, "to be published". In 2012, after some changes to the Swiss-Model server, it chose a different template, producing a very similar homology model. This second template was a crystal structure of the DnaC helicase loader of ''[http://microbewiki.kenyon.edu/index.php/Aquifex_aeolicus Aquafex aeolicus]'' ([[3ecc]])<ref name="3ec2_notemplate" />. The agreement between the models built upon two templates, which templates have only 27% sequence identity with each other, gives confidence that fold and topology of the models are likely to be correct. Furthermore, the two homology models had identical registrations of sequence with structure (data not shown). Nevertheless, because the sequence identity between the templates and the target <i>E. coli</i> DnaC is only ~20%, there may be some error in the registration of the <i>E. coli</i> DnaC sequence with the model structure. Further, the positions of sidechains in homology models are generally unreliable.
We thank the authors of [[2qgz]] for releasing their structure data at the [[Protein Data Bank]] prior to full publication.
We thank the authors of [[2qgz]] for releasing their structure data at the [[Protein Data Bank]] prior to full publication.
Line 18: Line 23:
====Viewing and Download====
====Viewing and Download====
In addition to the interactive scenes below, the homology models can be downloaded from the Proteopedia server:
In addition to the interactive scenes below, the homology models can be downloaded from the Proteopedia server:
-
* 2008 model [[:Image:Dnac_from_2ggz_a.pdb|Dnac_from_2ggz_a.pdb]] / [http://oca.weizmann.ac.il/oca-docs/fgij/fg.htm?mol=http%3A//proteopedia.org/wiki/images/3/3e/Dnac_from_2ggz_a.pdb view and explore in FirstGlance in Jmol]. (Note that the PDB filename contains a typographical error: 2ggz should have been 2qgz.)
+
* 2008 model residues 55-237 of ''E. coli'' DnaC: [[:Image:Dnac_from_2ggz_a.pdb|Dnac_from_2ggz_a.pdb]] / [http://oca.weizmann.ac.il/oca-docs/fgij/fg.htm?mol=http%3A//proteopedia.org/wiki/images/3/3e/Dnac_from_2ggz_a.pdb view and explore in FirstGlance in Jmol]. (Note that the PDB filename contains a typographical error: 2ggz should have been 2qgz.)
-
* 2012 model [[:Image:Dnac-64-237-from-3eccA.pdb.zip|Dnac-64-237-from-3eccA.pdb.zip]] / [http://oca.weizmann.ac.il/oca-docs/fgij/fg.htm?mol=http%3A//proteopedia.org/wiki/images/c/c0/Dnac-64-237-from-3eccA.pdb.zip view and explore in FirstGlance in Jmol].
+
* 2012 model residues 64-237 of ''E. coli'' DnaC: [[:Image:Dnac-64-237-from-3eccA.pdb.zip|Dnac-64-237-from-3eccA.pdb.zip]] / [http://oca.weizmann.ac.il/oca-docs/fgij/fg.htm?mol=http%3A//proteopedia.org/wiki/images/c/c0/Dnac-64-237-from-3eccA.pdb.zip view and explore in FirstGlance in Jmol].
===Conclusions from Homology Model===
===Conclusions from Homology Model===
-
<applet load='Dnac_from_2ggz_a.pdb' size='450' frame='true' align='right'
 
-
scene='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8' />
 
- 
The following analysis utilizes the homology model templated on [[2qgz]]. The model templated on [[3ecc]] is very similar, with identical sequence-to-structure registration (not shown). When the two homology models are structurally aligned, 115 alpha carbon atoms can be aligned with RMS deviation of 1.4 &Aring; (not shown).
The following analysis utilizes the homology model templated on [[2qgz]]. The model templated on [[3ecc]] is very similar, with identical sequence-to-structure registration (not shown). When the two homology models are structurally aligned, 115 alpha carbon atoms can be aligned with RMS deviation of 1.4 &Aring; (not shown).
Line 55: Line 57:
==Homology Model Construction==
==Homology Model Construction==
-
<applet load='Dnac_from_2ggz_a.pdb' size='450' frame='true' align='right'
 
-
scene='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8' />
 
- 
[http://www.bio.umass.edu/micro/faculty/sandler.html Steve Sandler] kindly provided the following sequence for DnaC from E. coli (Uniprot P0AEF0, DNAC_ECOLI):
[http://www.bio.umass.edu/micro/faculty/sandler.html Steve Sandler] kindly provided the following sequence for DnaC from E. coli (Uniprot P0AEF0, DNAC_ECOLI):
Line 78: Line 77:
As indicated [[#3D Structure: Homology Model|above]], in 2008, Swiss-Model found only one usable template for homology modeling, despite the existence of an empirical 3D crystal structure for DnaC with a slightly higher sequence identity.
As indicated [[#3D Structure: Homology Model|above]], in 2008, Swiss-Model found only one usable template for homology modeling, despite the existence of an empirical 3D crystal structure for DnaC with a slightly higher sequence identity.
{{Clear}}
{{Clear}}
 +
 +
===Gaps in the Template Model===
 +
 +
The template was 2QGZ (<scene name='User:Eric_Martz/Sandbox_4/2qgz/3'>initial scene</scene>). The portion of the template used was Glu107-Arg300. Only the amino-terminal 6 residues were not used as template (translucent). Note that there are <scene name='User:Eric_Martz/Sandbox_4/2qgz/5'>three loops</scene> in this segment of the template that lack coordinates due to [[disorder]] in the crystal (marked with spacefilled alpha-carbon atoms).
 +
 +
The missing loops are 202-205 (NGSV), 226-231 (EQATSW), and 268-275 (TIKGSDET). These gaps, which occur between the residues marked /\ below, were apparently ignored in making the model, which has a continuous main chain.
 +
 +
{{Clear}}
 +
 +
==Confirmation of Homology Model By Related Structures==
 +
When the [[PDB]] is searched with the DnaC sequence, the best match (December, 2008) is 23% sequence identity with 183 amino acids in the DnaC helicase loader of ''Aquifex aeolicus'', [[3ec2]] and [[3ecc]]. In order to find whether these structures have the same fold as the template ([[2qgz]] with 19% sequence identity to ''E. coli'' DnaC) used for the homology model, <font color="#3030ff">'''2qgz'''</font> <scene name='User:Eric_Martz/Sandbox_6/2qgz_3ec2_aligned_pdb/1'>was structurally aligned</scene> with <font color="#ff0000">'''3ec2'''</font><ref>The structural alignment of 2qgz with 3ec2 was performed with the ''Magic Fit'' function of DeepView version 3.6beta2. 2qgz 115-259 aligned with 3ec2 42-185 (3 gaps in 3ec2's alignment: 128-9, 134-5, 155-9). 135 alpha carbons were aligned with RMS 2.76 Å. The sequence identity between 2qgz and 3ec2 is 28% over the 185 amino acid length of the shorter, 3ec2. ''Magic Fit'' is a sequence-alignment-guided structural alignment (see [[Structural_alignment_tools#DeepView_.3D_Swiss-PDBViewer|Structural alignment tools]]).</ref>. The similarity of folds lends considerable confidence to the homology model of ''E. coli'' DnaC. This was further confirmed by the 2012 Swiss Model run, when 3ecc was selected as the best template (see discussion above).
 +
 +
The second best sequence-identity hit in the PDB is 39% identity with 54 amino acids (positions 9-63 of chain A) of replication factor C ([[2chg]]), which align with 72-124 of DnaC. When the above homology model of DnaC (made with template 2QGZ) is <scene name='User:Eric_Martz/Sandbox_4/2chg9-63_aligned_with_dnac_mod/1'>structurally aligned</scene> with residues 9-63 of 2CHG<ref>Structural alignment done with DeepView 3.6b3 using Magic Fit of carbon alphas.</ref>, 43 alpha carbons (out of 54) aligned with RMS deviation 2.3 &Aring;. <font color="#ff0000">'''Residues 21-63 of 2CHG'''</font> aligned with <font color="#3030ff">'''residues 80-124 of the DnaC homology model'''</font>. (Non-aligned portions are pastel.) This result adds firther confidence to this region of the homology model, since the structural alignment of 2CHG:A21-63 occurred in the same range as the sequence alignment (which was 72-124 in DnaC).
 +
 +
''Download'' the above structural alignments:
 +
*[[:Image:2qgz_3ec2_aligned.pdb|2qgz_3ec2_aligned.pdb]]
 +
*[[:Image:2chg9-63_aligned_with_dnac_model.pdb|2chg9-63_aligned_with_dnac_model.pdb]]
 +
{{Clear}}
 +
 +
==Crystal Structure of DnaC Is "In The Pipeline"==
 +
 +
A sequence-based search at the international [http://targetdb.pdb.org/ Structural Genomics TargetDB] reveals that the closest completed structure is [[2qgz]], the one chosen by SwissModel as a template. ([[3ec2]] and [[3ecc]] were not determined by a structural genomics project.) A number of crystal and NMR structures have sequence identities up to 37% but over shorter stretches, and with higher E values.
 +
 +
Diffraction data have been obtained (but the solved structure not yet deposited) for a ''Listeria monocytogenes'' sequence of 307 residues, pI 5.2, with an E value of 1.6e-05, though only 21% sequence identity. Diffraction-quality crystals (but not yet diffraction data) have not been obtained for any sequence with such a low E value.
 +
 +
''E. coli'' DnaC (245 residues, pI 9.4) has been crystallized by RIKEN Structural Genomics Initiative (Japan), but the crystals may not be of diffraction quality. It has been cloned, expressed as a soluble protein, and purified (but not yet crystallized) by 3 Structural Genomics Groups (RIKEN Structural Genomics Initiative (Japan), Montreal-Kingston Bacterial Structural Genomics Initiative, Midwest Center for Structural Genomics), as have several proteins with >40% sequence identity.
 +
 +
Thus, there is reason for optimism that either a crystal structure, or a more suitable template for homology modeling, might be forthcoming.
 +
 +
 +
==DnaC helicase loader 3D structures==
 +
 +
[[DnaC helicase loader]]
 +
 +
==Additional Resources==
 +
For additional information, see: [[DNA Replication, Repair, and Recombination]]
 +
For additional information, see: [[Nucleic Acids]]
 +
<br />
 +
</StructureSection>
{| class="wikitable" style="text-align:center"
{| class="wikitable" style="text-align:center"
|+ Templates for 2008 Homology Modeling of E. coli DnaC (245 amino acids)
|+ Templates for 2008 Homology Modeling of E. coli DnaC (245 amino acids)
Line 90: Line 128:
(a) Lengths not in parentheses are for crystallographic results, and are counts of amino acids with coordinates; they exclude disordered residues ("gaps" in the model). Lengths in parentheses are for the target sequence of DnaC, or sequences of the crystallized protein (from SEQRES in the PDB file).
(a) Lengths not in parentheses are for crystallographic results, and are counts of amino acids with coordinates; they exclude disordered residues ("gaps" in the model). Lengths in parentheses are for the target sequence of DnaC, or sequences of the crystallized protein (from SEQRES in the PDB file).
-
===Gaps in the Template Model===
 
-
<applet load='Dnac_from_2ggz_a.pdb' size='500' frame='true' align='right'
 
-
scene='User:Eric_Martz/Sandbox_4/2qgz/3' />
 
-
The template was 2QGZ (<scene name='User:Eric_Martz/Sandbox_4/2qgz/3'>initial scene</scene>). The portion of the template used was Glu107-Arg300. Only the amino-terminal 6 residues were not used as template (translucent). Note that there are <scene name='User:Eric_Martz/Sandbox_4/2qgz/5'>three loops</scene> in this segment of the template that lack coordinates due to [[disorder]] in the crystal (marked with spacefilled alpha-carbon atoms).
 
- 
-
The missing loops are 202-205 (NGSV), 226-231 (EQATSW), and 268-275 (TIKGSDET). These gaps, which occur between the residues marked /\ below, were apparently ignored in making the model, which has a continuous main chain.
 
- 
-
{{Clear}}
 
Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some).
Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some).
<pre>
<pre>
Line 176: Line 206:
270 280 290 300
270 280 290 300
</pre>
</pre>
- 
-
==Confirmation of Homology Model By Related Structures==
 
-
<applet load='2chg9-63_aligned_with_dnac_model.pdb' size='400' frame='true' align='right' caption='Structural alignment.' scene='User:Eric_Martz/Sandbox_6/2qgz_3ec2_aligned_pdb/1'/>
 
- 
-
When the [[PDB]] is searched with the DnaC sequence, the best match (December, 2008) is 23% sequence identity with 183 amino acids in the DnaC helicase loader of ''Aquifex aeolicus'', [[3ec2]] and [[3ecc]]. In order to find whether these structures have the same fold as the template ([[2qgz]] with 19% sequence identity to ''E. coli'' DnaC) used for the homology model, <font color="#3030ff">'''2qgz'''</font> was structurally aligned (<scene name='User:Eric_Martz/Sandbox_6/2qgz_3ec2_aligned_pdb/1'>restore initial alignment scene</scene>) with <font color="#ff0000">'''3ec2'''</font><ref>The structural alignment of 2qgz with 3ec2 was performed with the ''Magic Fit'' function of DeepView version 3.6beta2. 2qgz 115-259 aligned with 3ec2 42-185 (3 gaps in 3ec2's alignment: 128-9, 134-5, 155-9). 135 alpha carbons were aligned with RMS 2.76 Å. The sequence identity between 2qgz and 3ec2 is 28% over the 185 amino acid length of the shorter, 3ec2. ''Magic Fit'' is a sequence-alignment-guided structural alignment (see [[Structural_alignment_tools#DeepView_.3D_Swiss-PDBViewer|Structural alignment tools]]).</ref>. The similarity of folds lends considerable confidence to the homology model of ''E. coli'' DnaC.
 
- 
-
The second best sequence-identity hit in the PDB is 39% identity with 54 amino acids (positions 9-63 of chain A) of replication factor C ([[2chg]]), which align with 72-124 of DnaC. When the above homology model of DnaC (made with template 2QGZ) is <scene name='User:Eric_Martz/Sandbox_4/2chg9-63_aligned_with_dnac_mod/1'>structurally aligned</scene> with residues 9-63 of 2CHG<ref>Structural alignment done with DeepView 3.6b3 using Magic Fit of carbon alphas.</ref>, 43 alpha carbons (out of 54) aligned with RMS deviation 2.3 &Aring;. <font color="#ff0000">'''Residues 21-63 of 2CHG'''</font> aligned with <font color="#3030ff">'''residues 80-124 of the DnaC homology model'''</font>. (Non-aligned portions are pastel.) This result adds firther confidence to this region of the homology model, since the structural alignment of 2CHG:A21-63 occurred in the same range as the sequence alignment (which was 72-124 in DnaC).
 
- 
-
''Download'' the above structural alignments:
 
-
*[[:Image:2qgz_3ec2_aligned.pdb|2qgz_3ec2_aligned.pdb]]
 
-
*[[:Image:2chg9-63_aligned_with_dnac_model.pdb|2chg9-63_aligned_with_dnac_model.pdb]]
 
-
{{Clear}}
 
- 
-
==Crystal Structure of DnaC Is "In The Pipeline"==
 
- 
-
A sequence-based search at the international [http://targetdb.pdb.org/ Structural Genomics TargetDB] reveals that the closest completed structure is [[2qgz]], the one chosen by SwissModel as a template. ([[3ec2]] and [[3ecc]] were not determined by a structural genomics project.) A number of crystal and NMR structures have sequence identities up to 37% but over shorter stretches, and with higher E values.
 
- 
-
Diffraction data have been obtained (but the solved structure not yet deposited) for a ''Listeria monocytogenes'' sequence of 307 residues, pI 5.2, with an E value of 1.6e-05, though only 21% sequence identity. Diffraction-quality crystals (but not yet diffraction data) have not been obtained for any sequence with such a low E value.
 
- 
-
''E. coli'' DnaC (245 residues, pI 9.4) has been crystallized by RIKEN Structural Genomics Initiative (Japan), but the crystals may not be of diffraction quality. It has been cloned, expressed as a soluble protein, and purified (but not yet crystallized) by 3 Structural Genomics Groups (RIKEN Structural Genomics Initiative (Japan), Montreal-Kingston Bacterial Structural Genomics Initiative, Midwest Center for Structural Genomics), as have several proteins with >40% sequence identity.
 
- 
-
Thus, there is reason for optimism that either a crystal structure, or a more suitable template for homology modeling, might be forthcoming.
 
==ConSurf Coloring Script==
==ConSurf Coloring Script==
Line 274: Line 282:
define CON1 selected
define CON1 selected
</pre>
</pre>
- 
-
==3D structures of DnaC helicase loader==
 
- 
-
[[DnaC helicase loader]]
 
- 
-
==Additional Resources==
 
-
For additional information, see: [[DNA Replication, Repair, and Recombination]]
 
-
For additional information, see: [[Nucleic Acids]]
 
-
<br />
 
==Notes & References ==
==Notes & References ==
<references />
<references />

Current revision

Since this homology model analysis was done in 2008-2012, this structure has been solved by cryo-EM (see 6qem). Nevertheless, the story below remains a testament to the effort involved in homology modeling before the structure was solved, and before it was easy to make a reliable prediction with AlphaFold.

Theoretical Model: The protein structure described on this page was determined theoretically, and hence should be interpreted with caution.
Drag the structure with the mouse to rotate
Templates for 2008 Homology Modeling of E. coli DnaC (245 amino acids)
Name PDB Code (Resolution) Released Length (amino acids)a Template alignment lengtha: range (%) Target alignment lengtha: range (%) Aligned Sequence Identity Expectations Swiss Model Result
Putative Primosome Component Streptococcus Pyogenes 2qgz (2.4 Å) Jul 24 2007 183 (308) 174:107-292 (95%) [sm] (183): 55-237 (75%) [sm] 18.6% [sm]; 19.7% [tdb] 3.4e-28 [sm]; 0.00027 [tdb]; >10 [pdbB]; 0.0028 [pdbF] DnaC modeled from 2qgz chain A
DnaC helicase loader Aquifex aeolicus 3ec2 (2.7 Å) Nov 25 2008 175 (180) 174: 6-179 (95%) [pdbB] (163): 68-230 (67%) [pdbB] 23.5% [pdbB] 0.00059 [pdbB] "Alignment is not good enough for Modelling"

Sources: Swiss-Model [sm]; targetdb.pdb.org [tdb]; pdb.org using a BLAST search [pdbB], or a FASTA search [pdbF].
(a) Lengths not in parentheses are for crystallographic results, and are counts of amino acids with coordinates; they exclude disordered residues ("gaps" in the model). Lengths in parentheses are for the target sequence of DnaC, or sequences of the crystallized protein (from SEQRES in the PDB file).

Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some).

                                                 |     | |  |     ||
TARGET    55             R TFNRSGIRPL HQNCSFENYR VECEGQMNAL SKARQYVEEF
2qgzA     100   qkqaais--e riqlvslpks yrhihlsdid vnnasrmeaf saildfveqy
                                                                      
TARGET                     sssss    h h             hhhhhhh hhhhhhhhh 
2qgzA               hhh  h   sss    h h             hhhhhhh hhhhhhhhh 

                            |         | ||   ||     | |              |
TARGET    96    DGN-IASFIF SGKPGTGKNH LAAAICNELL L-RGKSVLII TVADIMSAMK
2qgzA     148   psaeqkglyl ygdmgigksy llaamahels ekkgvsttll hfpsfaidvk
                                                                      
TARGET                ssss ss     hhh hhhhhhhhhh h h   ssss sshhhhhhh 
2qgzA                 ssss ss     hhh hhhhhhhhhh hh    ssss sshhhhhhh 

                                   ||   |  | ||                |
TARGET    144   DTFRNSGTSE EQLLNDLSNV DLLVIDEIGV QTESKYEKVI INQIVDRRSS
2qgzA     198   naiske---- --eidavknv pvlilddiga vrde-----v lqvilqyrml
                   /\                          / \
TARGET                         hhh     ssssss               hhhhhhhhhh
2qgzA                        hh   h    ssssss               hhhhhhhhhh

                   |     |                 ||| |  |               |
TARGET    194   SKRPTGMLTN SNMEEMTKLL ---GERVMDR MRLGNSLWVI FNWDSYR   
2qgzA     247   eelptfftsn ysfadlerkw awqakrvmer vr-ylarefh leganrr-  
                                      /\
TARGET          h  ssssss    hhhhh          hhhh hh  ssssss s         
2qgzA           h  ssssss    hhhh           hhhh hh hh ssss s 

Below is the sequence with ATOM records (coordinates) from 2QGZ, numbered 100-300, showing the gaps as "...". This sequence listing was used to locate the positions marked /\ above.

    1 .......... .......... .......... .......... .......... 
   51 .......... .......... .......... .......... .........Q 
  101 KQAAISERIQ LVSLPKSYRH IHLSDIDVNN ASRMEAFSAI LDFVEQYPSA 
  151 EQKGLYLYGD MGIGKSYLLA AMAHELSEKK GVSTTLLHFP SFAIDVKNAI 
  201 S....KEEID AVKNVPVLIL DDIGA..... .VRDEVLQVI LQYRMLEELP 

  251 TFFTSNYSFA DLERKWA... .....WQAKR VMERVRYLAR EFHLEGANRR 

(Copied from Protein Explorer's sequence display.)

Below is the alignment of full-length DnaC with 2QGZ according to TargetDB (see above). Note that the 2QGZ structure begins at residue 100, and so the homology model begins with residue 55 of DnaC, indicated with > below.

ID:   DR58   Center: NESGC
E-value: 0.00028  Identity: 19.737%

                                     10        20        30        
Query                        MKNVGDLMQRLQKMMPAHIKPAFKTGEELLAWQKEQGA
                                     Q+ Q   P++I  +++    +     + + 
Subjct EVASFISQHHLSQEQINLSLSKFNQFLVERQKYQLKDPSYIAKGYQPILAMNEGYADVSY
               40        50        60        70        80        90

       40        50    >   60        70        80        90        
Query  IRSAALERENRAMKMQRTFNRSGIRPLHQNCSFENYRVECEGQMNALSKARQYVEEF-DG
       +++  L + ++   +++ ++  ++   +++  + +  V+  ++M+A+S   ++VE++ ++
Subjct LETKELVEAQKQAAISERIQLVSLPKSYRHIHLSDIDVNNASRMEAFSAILDFVEQYPSA
              100       110       120       130       140       150

       100       110       120        130       140       150      
Query  NIASFIFSGKPGTGKNHLAAAICNELLLR-GKSVLIITVADIMSAMKDTFRNSGTSEEQL
       +  ++ + G  G GK++L AA+ +EL  + G S+ ++   ++   +K+++ N++++EE  
Subjct EQKGLYLYGDMGIGKSYLLAAMAHELSEKKGVSTTLLHFPSFAIDVKNAISNGSVKEE--
              160       170       180       190       200          

        160       170        180       190       200       210     
Query  LNDLSNVDLLVIDEIGV-QTESKYEKVIINQIVDRRSSSKRPTGMLTNSNMEEMTK----
       ++ ++NV +L++D+IG+ Q+ S  +  +++ I++ R   + PT + +N ++ ++ +    
Subjct IDAVKNVPVLILDDIGAEQATSWVRDEVLQVILQYRMLEELPTFFTSNYSFADLERKWAT
      210       220       230       240       250       260        

                    220       230       240     
Query  LLG-------ERVMDRMRLGNSLWVIFNWDSYRSRVTGKEY
       + G       +RVM+R+R                       
Subjct IKGSDETWQAKRVMERVRYLAREFHLEGANRR         
      270       280       290       300         

ConSurf Coloring Script

For an explanation of the evolutionary conservation results, see above. The script below is from the 2012 analysis[4]. It can be run in Jmol to color the amino acids of DnaC by evolutionary conservation. CON10 marks insufficient data. CON9 is the highest level of conservation, and CON1 is the lowest (most variable).

select all
color [200,200,200]

select PHE57
color [255,255,150]
spacefill
define CON10 selected

select ILE62, ASN73, GLY106, GLY109, THR110, GLY111, LYS112, HIS114, LEU115
select selected or ALA116, ALA118, GLU153, LEU165, LEU166, ASP169, GLU170
select selected or GLY172, ASP189, ARG191, ASN203, ARG216, ASP219, ARG220
select selected or TRP233, SER235, ARG237
color [160,37,96]
spacefill
define CON9 selected

select ARG55, SER60, GLY61, LEU65, PHE71, TYR74, ALA84, VAL92, PHE95, ASN113
select selected or ILE119, LEU123, VAL130, THR134, THR145, VAL163, ILE168
select selected or GLN174, SER177, GLU180, ILE187, SER192, PRO197, THR198
select selected or THR202, GLY214, MET221, SER226, PHE231
color [240,125,171]
spacefill
define CON8 selected

select HIS66, GLN81, PHE102, VAL135, SER140, LYS143, SER152, LEU156, ASP164
select selected or VAL167, ILE171, ILE184, ASN185, VAL188, GLY199, LEU213
color [250,201,222]
spacefill
define CON7 selected

select THR56, ARG59, CYS69, SER70, ALA88, TYR91, ILE99, SER101, PHE104, SER105
select selected or ALA117, CYS120, ASN121, LEU124, GLY127, SER129, ILE133
select selected or ALA136, ASP137, ILE138, MET139, PHE146, ILE183, GLN186
select selected or ARG190, SER193, MET200, LEU201, SER204, LEU223, GLY224
select selected or ASN225, VAL229
color [252,237,244]
spacefill
define CON6 selected

select ASN58, ARG63, ASN68, VAL76, GLY80, LEU85, ASN98, ALA100, ILE103, LEU131
select selected or MET142, LEU157, LEU160, SER161, VAL182, SER194, ASN205
select selected or MET209, VAL217, TYR236
color [255,255,255]
spacefill
define CON5 selected

select ARG89, GLU94, PRO108, SER149, GLU154, LYS178, TYR179, LYS181, ARG196
select selected or GLU215, LEU227
color [234,255,255]
spacefill
define CON4 selected

select PRO64, GLN67, GLU72, CYS78, MET82, ILE132, GLU176, GLU208, ASN232
color [215,255,255]
spacefill
define CON3 selected

select GLN90, ARG126, ALA141, VAL173, LYS195
color [140,255,255]
spacefill
define CON2 selected

select ARG75, GLU77, GLU79, ASN83, SER86, LYS87, GLU93, ASP96, GLY97, LYS107
select selected or GLU122, LEU125, LYS128, ASP144, ARG147, ASN148, GLY150
select selected or THR151, GLN155, ASN158, ASP159, ASN162, THR175, MET206
select selected or GLU207, THR210, LYS211, LEU212, MET218, ARG222, TRP228
select selected or ILE230, ASP234
color [16,200,209]
spacefill
define CON1 selected

Notes & References

  1. A model was created in 2008 by Swiss-Model using its totally automated first approach mode with template 2qgz. In 2012, Swiss-Model's automated mode chose a different template, 3ecc, and created a similar model.
  2. Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201. Free full text. Server: swissmodel.expasy.org
  3. 3.0 3.1 In December, 2008, Swiss-Model deemed the sequence alignment of E. coli DnaC with A. aeolicus DnaC to be too unreliable to permit using the 3ec2 structure of the latter as a template for homology modeling of E. coli DnaC.
  4. 4.0 4.1 4.2 In the 2012 analysis, ConSurf found 47 unique sequences in Clean Uniprot. The MSA had an average pairwise distance of 0.98.
  5. In 2008, ConSurf found only 10 sequences in SwissProt, with an average pairwise distance (APD), in the multiple sequence alignment, of 1.6. The run shown here used 100 sequences from Uniprot, with an APD of 1.4.
  6. ConSurf result using 50 sequences from Uniprot, with an average pairwise distance in the multiple sequence alignment of 1.6.
  7. Not clear to User:Eric Martz in December, 2008.
  8. Registration refers to the positioning of amino acids along the backbone of the homology model. Amino acids are "in register" when correctly positioned. The sequence of the target protein (DnaC) can be thought of as sliding along the template backbone, as a consequence of the process of sequence alignment (or threading). The correct registration will be known only when an empirical crystallographic structure becomes available for DnaC.
  9. The structural alignment of 2qgz with 3ec2 was performed with the Magic Fit function of DeepView version 3.6beta2. 2qgz 115-259 aligned with 3ec2 42-185 (3 gaps in 3ec2's alignment: 128-9, 134-5, 155-9). 135 alpha carbons were aligned with RMS 2.76 Å. The sequence identity between 2qgz and 3ec2 is 28% over the 185 amino acid length of the shorter, 3ec2. Magic Fit is a sequence-alignment-guided structural alignment (see Structural alignment tools).
  10. Structural alignment done with DeepView 3.6b3 using Magic Fit of carbon alphas.

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz, Alexander Berchansky, Joel L. Sussman, David Canner, Michal Harel

Personal tools