Structure of E. coli DnaC helicase loader
From Proteopedia
(2 intermediate revisions not shown.) | |||
Line 1: | Line 1: | ||
+ | <table style="background-color:#ffe0e0"><tr><td> | ||
+ | Since this homology model analysis was done in 2008-2012, this structure has been solved by [[cryo-EM]] (see [[6qem]]). Nevertheless, the story below remains a testament to the effort involved in homology modeling before the structure was solved, and before it was easy to make a reliable prediction with [[AlphaFold]]. | ||
+ | </td></tr></table> | ||
{{Theoretical_model}} | {{Theoretical_model}} | ||
<StructureSection load='Dnac_from_2ggz_a.pdb' size='400' side='right' scene='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8' caption=''> | <StructureSection load='Dnac_from_2ggz_a.pdb' size='400' side='right' scene='User:Eric_Martz/Sandbox_4/Dnac_model_from_2ggz_a/8' caption=''> | ||
Line 9: | Line 12: | ||
===3D Structure: Homology Model=== | ===3D Structure: Homology Model=== | ||
- | No empirical (X-ray crystallographic) 3D structure for the ''[http://microbewiki.kenyon.edu/index.php/Escherichia_coli E. coli]'' DnaC protein ([http://www.uniprot.org/uniprot/P0AEF0 UniProt P0AEF0]) is available in November, 2012, although one or more [[#Crystal Structure of DnaC Is | + | No empirical (X-ray crystallographic) 3D structure for the ''[http://microbewiki.kenyon.edu/index.php/Escherichia_coli E. coli]'' DnaC protein ([http://www.uniprot.org/uniprot/P0AEF0 UniProt P0AEF0]) is available in November, 2012, although one or more [[#Crystal Structure of DnaC Is "In The Pipeline"|might become available]]. In view of this, [[Homology modeling|homology models]] were constructed using the automated Swiss-Model server<ref name="methods">A model was created in 2008 by Swiss-Model using its totally automated ''first approach'' mode with template [[2qgz]]. In 2012, Swiss-Model's automated mode chose a different template, [[3ecc]], and created a similar model. </ref><ref name="swissmodel">Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201. [http://bioinformatics.oxfordjournals.org/cgi/content/abstract/22/2/195 Free full text]. Server: [http://swissmodel.expasy.org swissmodel.expasy.org]</ref>. In 2008 (when this article was largely written and the molecular scenes were prepared), Swiss-Model deemed the only usable template<ref name="3ec2_notemplate">In December, 2008, Swiss-Model deemed the sequence alignment of ''E. coli'' DnaC with ''A. aeolicus'' DnaC to be too unreliable to permit using the [[3ec2]] structure of the latter as a template for homology modeling of <i>E. coli</i> DnaC.</ref> for the homology model to be the crystal structure of a "putative primosome component" from ''[http://microbewiki.kenyon.edu/index.php/Streptococcus_pyogenes Streptococcus pyogenes]'' ([[2qgz]]) determined by the Northeast Structural Genomics Consortium, "to be published". In 2012, after some changes to the Swiss-Model server, it chose a different template, producing a very similar homology model. This second template was a crystal structure of the DnaC helicase loader of ''[http://microbewiki.kenyon.edu/index.php/Aquifex_aeolicus Aquafex aeolicus]'' ([[3ecc]])<ref name="3ec2_notemplate" />. The agreement between the models built upon two templates, which templates have only 27% sequence identity with each other, gives confidence that fold and topology of the models are likely to be correct. Furthermore, the two homology models had identical registrations of sequence with structure (data not shown). Nevertheless, because the sequence identity between the templates and the target <i>E. coli</i> DnaC is only ~20%, there may be some error in the registration of the <i>E. coli</i> DnaC sequence with the model structure. Further, the positions of sidechains in homology models are generally unreliable. |
We thank the authors of [[2qgz]] for releasing their structure data at the [[Protein Data Bank]] prior to full publication. | We thank the authors of [[2qgz]] for releasing their structure data at the [[Protein Data Bank]] prior to full publication. |
Current revision
Since this homology model analysis was done in 2008-2012, this structure has been solved by cryo-EM (see 6qem). Nevertheless, the story below remains a testament to the effort involved in homology modeling before the structure was solved, and before it was easy to make a reliable prediction with AlphaFold. |
Theoretical Model: The protein structure described on this page was determined theoretically, and hence should be interpreted with caution. |
|
Name | PDB Code (Resolution) | Released | Length (amino acids)a | Template alignment lengtha: range (%) | Target alignment lengtha: range (%) | Aligned Sequence Identity | Expectations | Swiss Model Result |
---|---|---|---|---|---|---|---|---|
Putative Primosome Component Streptococcus Pyogenes | 2qgz (2.4 Å) | Jul 24 2007 | 183 (308) | 174:107-292 (95%) [sm] | (183): 55-237 (75%) [sm] | 18.6% [sm]; 19.7% [tdb] | 3.4e-28 [sm]; 0.00027 [tdb]; >10 [pdbB]; 0.0028 [pdbF] | DnaC modeled from 2qgz chain A |
DnaC helicase loader Aquifex aeolicus | 3ec2 (2.7 Å) | Nov 25 2008 | 175 (180) | 174: 6-179 (95%) [pdbB] | (163): 68-230 (67%) [pdbB] | 23.5% [pdbB] | 0.00059 [pdbB] | "Alignment is not good enough for Modelling" |
Sources: Swiss-Model [sm]; targetdb.pdb.org [tdb]; pdb.org using a BLAST search [pdbB], or a FASTA search [pdbF].
(a) Lengths not in parentheses are for crystallographic results, and are counts of amino acids with coordinates; they exclude disordered residues ("gaps" in the model). Lengths in parentheses are for the target sequence of DnaC, or sequences of the crystallized protein (from SEQRES in the PDB file).
Below is the alignment produced by Swiss Model, used in making the 3D model. Vertical bars for identity were inserted by hand (I may have missed some).
| | | | || TARGET 55 R TFNRSGIRPL HQNCSFENYR VECEGQMNAL SKARQYVEEF 2qgzA 100 qkqaais--e riqlvslpks yrhihlsdid vnnasrmeaf saildfveqy TARGET sssss h h hhhhhhh hhhhhhhhh 2qgzA hhh h sss h h hhhhhhh hhhhhhhhh | | || || | | | TARGET 96 DGN-IASFIF SGKPGTGKNH LAAAICNELL L-RGKSVLII TVADIMSAMK 2qgzA 148 psaeqkglyl ygdmgigksy llaamahels ekkgvsttll hfpsfaidvk TARGET ssss ss hhh hhhhhhhhhh h h ssss sshhhhhhh 2qgzA ssss ss hhh hhhhhhhhhh hh ssss sshhhhhhh || | | || | TARGET 144 DTFRNSGTSE EQLLNDLSNV DLLVIDEIGV QTESKYEKVI INQIVDRRSS 2qgzA 198 naiske---- --eidavknv pvlilddiga vrde-----v lqvilqyrml /\ / \ TARGET hhh ssssss hhhhhhhhhh 2qgzA hh h ssssss hhhhhhhhhh | | ||| | | | TARGET 194 SKRPTGMLTN SNMEEMTKLL ---GERVMDR MRLGNSLWVI FNWDSYR 2qgzA 247 eelptfftsn ysfadlerkw awqakrvmer vr-ylarefh leganrr- /\ TARGET h ssssss hhhhh hhhh hh ssssss s 2qgzA h ssssss hhhh hhhh hh hh ssss s
Below is the sequence with ATOM records (coordinates) from 2QGZ, numbered 100-300, showing the gaps as "...". This sequence listing was used to locate the positions marked /\ above.
1 .......... .......... .......... .......... .......... 51 .......... .......... .......... .......... .........Q 101 KQAAISERIQ LVSLPKSYRH IHLSDIDVNN ASRMEAFSAI LDFVEQYPSA 151 EQKGLYLYGD MGIGKSYLLA AMAHELSEKK GVSTTLLHFP SFAIDVKNAI 201 S....KEEID AVKNVPVLIL DDIGA..... .VRDEVLQVI LQYRMLEELP 251 TFFTSNYSFA DLERKWA... .....WQAKR VMERVRYLAR EFHLEGANRR
(Copied from Protein Explorer's sequence display.)
Below is the alignment of full-length DnaC with 2QGZ according to TargetDB (see above). Note that the 2QGZ structure begins at residue 100, and so the homology model begins with residue 55 of DnaC, indicated with > below.
ID: DR58 Center: NESGC E-value: 0.00028 Identity: 19.737% 10 20 30 Query MKNVGDLMQRLQKMMPAHIKPAFKTGEELLAWQKEQGA Q+ Q P++I +++ + + + Subjct EVASFISQHHLSQEQINLSLSKFNQFLVERQKYQLKDPSYIAKGYQPILAMNEGYADVSY 40 50 60 70 80 90 40 50 > 60 70 80 90 Query IRSAALERENRAMKMQRTFNRSGIRPLHQNCSFENYRVECEGQMNALSKARQYVEEF-DG +++ L + ++ +++ ++ ++ +++ + + V+ ++M+A+S ++VE++ ++ Subjct LETKELVEAQKQAAISERIQLVSLPKSYRHIHLSDIDVNNASRMEAFSAILDFVEQYPSA 100 110 120 130 140 150 100 110 120 130 140 150 Query NIASFIFSGKPGTGKNHLAAAICNELLLR-GKSVLIITVADIMSAMKDTFRNSGTSEEQL + ++ + G G GK++L AA+ +EL + G S+ ++ ++ +K+++ N++++EE Subjct EQKGLYLYGDMGIGKSYLLAAMAHELSEKKGVSTTLLHFPSFAIDVKNAISNGSVKEE-- 160 170 180 190 200 160 170 180 190 200 210 Query LNDLSNVDLLVIDEIGV-QTESKYEKVIINQIVDRRSSSKRPTGMLTNSNMEEMTK---- ++ ++NV +L++D+IG+ Q+ S + +++ I++ R + PT + +N ++ ++ + Subjct IDAVKNVPVLILDDIGAEQATSWVRDEVLQVILQYRMLEELPTFFTSNYSFADLERKWAT 210 220 230 240 250 260 220 230 240 Query LLG-------ERVMDRMRLGNSLWVIFNWDSYRSRVTGKEY + G +RVM+R+R Subjct IKGSDETWQAKRVMERVRYLAREFHLEGANRR 270 280 290 300
ConSurf Coloring Script
For an explanation of the evolutionary conservation results, see above. The script below is from the 2012 analysis[4]. It can be run in Jmol to color the amino acids of DnaC by evolutionary conservation. CON10 marks insufficient data. CON9 is the highest level of conservation, and CON1 is the lowest (most variable).
select all color [200,200,200] select PHE57 color [255,255,150] spacefill define CON10 selected select ILE62, ASN73, GLY106, GLY109, THR110, GLY111, LYS112, HIS114, LEU115 select selected or ALA116, ALA118, GLU153, LEU165, LEU166, ASP169, GLU170 select selected or GLY172, ASP189, ARG191, ASN203, ARG216, ASP219, ARG220 select selected or TRP233, SER235, ARG237 color [160,37,96] spacefill define CON9 selected select ARG55, SER60, GLY61, LEU65, PHE71, TYR74, ALA84, VAL92, PHE95, ASN113 select selected or ILE119, LEU123, VAL130, THR134, THR145, VAL163, ILE168 select selected or GLN174, SER177, GLU180, ILE187, SER192, PRO197, THR198 select selected or THR202, GLY214, MET221, SER226, PHE231 color [240,125,171] spacefill define CON8 selected select HIS66, GLN81, PHE102, VAL135, SER140, LYS143, SER152, LEU156, ASP164 select selected or VAL167, ILE171, ILE184, ASN185, VAL188, GLY199, LEU213 color [250,201,222] spacefill define CON7 selected select THR56, ARG59, CYS69, SER70, ALA88, TYR91, ILE99, SER101, PHE104, SER105 select selected or ALA117, CYS120, ASN121, LEU124, GLY127, SER129, ILE133 select selected or ALA136, ASP137, ILE138, MET139, PHE146, ILE183, GLN186 select selected or ARG190, SER193, MET200, LEU201, SER204, LEU223, GLY224 select selected or ASN225, VAL229 color [252,237,244] spacefill define CON6 selected select ASN58, ARG63, ASN68, VAL76, GLY80, LEU85, ASN98, ALA100, ILE103, LEU131 select selected or MET142, LEU157, LEU160, SER161, VAL182, SER194, ASN205 select selected or MET209, VAL217, TYR236 color [255,255,255] spacefill define CON5 selected select ARG89, GLU94, PRO108, SER149, GLU154, LYS178, TYR179, LYS181, ARG196 select selected or GLU215, LEU227 color [234,255,255] spacefill define CON4 selected select PRO64, GLN67, GLU72, CYS78, MET82, ILE132, GLU176, GLU208, ASN232 color [215,255,255] spacefill define CON3 selected select GLN90, ARG126, ALA141, VAL173, LYS195 color [140,255,255] spacefill define CON2 selected select ARG75, GLU77, GLU79, ASN83, SER86, LYS87, GLU93, ASP96, GLY97, LYS107 select selected or GLU122, LEU125, LYS128, ASP144, ARG147, ASN148, GLY150 select selected or THR151, GLN155, ASN158, ASP159, ASN162, THR175, MET206 select selected or GLU207, THR210, LYS211, LEU212, MET218, ARG222, TRP228 select selected or ILE230, ASP234 color [16,200,209] spacefill define CON1 selected
Notes & References
- ↑ A model was created in 2008 by Swiss-Model using its totally automated first approach mode with template 2qgz. In 2012, Swiss-Model's automated mode chose a different template, 3ecc, and created a similar model.
- ↑ Arnold K., Bordoli L., Kopp J., and Schwede T. (2006). The SWISS-MODEL Workspace: A web-based environment for protein structure homology modelling. Bioinformatics, 22,195-201. Free full text. Server: swissmodel.expasy.org
- ↑ 3.0 3.1 In December, 2008, Swiss-Model deemed the sequence alignment of E. coli DnaC with A. aeolicus DnaC to be too unreliable to permit using the 3ec2 structure of the latter as a template for homology modeling of E. coli DnaC.
- ↑ 4.0 4.1 4.2 In the 2012 analysis, ConSurf found 47 unique sequences in Clean Uniprot. The MSA had an average pairwise distance of 0.98.
- ↑ In 2008, ConSurf found only 10 sequences in SwissProt, with an average pairwise distance (APD), in the multiple sequence alignment, of 1.6. The run shown here used 100 sequences from Uniprot, with an APD of 1.4.
- ↑ ConSurf result using 50 sequences from Uniprot, with an average pairwise distance in the multiple sequence alignment of 1.6.
- ↑ Not clear to User:Eric Martz in December, 2008.
- ↑ Registration refers to the positioning of amino acids along the backbone of the homology model. Amino acids are "in register" when correctly positioned. The sequence of the target protein (DnaC) can be thought of as sliding along the template backbone, as a consequence of the process of sequence alignment (or threading). The correct registration will be known only when an empirical crystallographic structure becomes available for DnaC.
- ↑ The structural alignment of 2qgz with 3ec2 was performed with the Magic Fit function of DeepView version 3.6beta2. 2qgz 115-259 aligned with 3ec2 42-185 (3 gaps in 3ec2's alignment: 128-9, 134-5, 155-9). 135 alpha carbons were aligned with RMS 2.76 Å. The sequence identity between 2qgz and 3ec2 is 28% over the 185 amino acid length of the shorter, 3ec2. Magic Fit is a sequence-alignment-guided structural alignment (see Structural alignment tools).
- ↑ Structural alignment done with DeepView 3.6b3 using Magic Fit of carbon alphas.
Proteopedia Page Contributors and Editors (what is this?)
Eric Martz, Alexander Berchansky, Joel L. Sussman, David Canner, Michal Harel