We apologize for Proteopedia being slow to respond. For the past two years, a new implementation of Proteopedia has been being built. Soon, it will replace this 18-year old system. All existing content will be moved to the new system at a date that will be announced here.

Journal:Proteins:3

From Proteopedia

(Difference between revisions)
Jump to: navigation, search
Current revision (17:13, 8 May 2023) (edit) (undo)
 
(31 intermediate revisions not shown.)
Line 1: Line 1:
-
==Do ''Newly Born'' orphan proteins resemble ''Never Born'' proteins? A study using three deep learning algorithms==
+
<StructureSection load='' size='350' side='right' scene='96/964832/1hco_morph_ca/5' caption=''>
-
 
+
=== Do ''Newly Born''’ Orphan Proteins Resemble ‘''Never Born''’ Proteins? A Study Using Three Deep Learning Algorithms ===
 +
<big>Jing Liu, Rongqing Yuan, Wei Shao, Jitong Wang, Israel Silman and Joel Sussman</big> <ref name="Liu">PMID:37092778</ref>
 +
<hr/>
''Newly Born'' proteins, or orphan proteins, have no sequence homology to other proteins and occur in single species or within a taxonomically restricted gene (TRG) family.
''Newly Born'' proteins, or orphan proteins, have no sequence homology to other proteins and occur in single species or within a taxonomically restricted gene (TRG) family.
Line 8: Line 10:
* AlphaFold2 (AF2) <ref name="AF2">PMID:34265844</ref>
* AlphaFold2 (AF2) <ref name="AF2">PMID:34265844</ref>
* RoseTTAFold (RTF) <ref name="RTF">PMID:34282049</ref>
* RoseTTAFold (RTF) <ref name="RTF">PMID:34282049</ref>
-
* Evolutionary Scale Modeling (ESM-2)<ref>doi.org/10.1101/2022.07.20.500902</ref>
+
* Evolutionary Scale Modeling (ESM-2)<ref>Lin Z et al. & Rives, A (2022) Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv. 2022:2022.2007.2020.500902. [http://dx.doi.org/10.1101/2022.07.20.500902 DOI:10.1101/2022.07.20.500902]</ref>
-
AF2 and RTF predict, by default, five top models, while ESM-2 predicts only one model. Morphing between the top models of AF2 and those of RTF give a visual feeling of how similar these 5 models are for each method.
+
AF2 and RTF predict, by default, five top models, while ESM-2 predicts only one model. Morphing between the top models of AF2 and those of RTF gives a visual feeling of how similar these 5 models are for each method.
-
True orphan proteins have no sequence homology to any existing protein. We thought, therefore, that the ''Never Born'' proteins generated and investigated by Tretyachenko ''et al.''<ref name="Tretyachenko">PMID:29133927</ref> would serve as a valuable benchmark for comparison. In their study they experimentally showed that some ''Never Born'' proteins folded into compact structures, ''e.g.'', as seen for Sequences #1856 and #6387.
+
True orphan proteins have no sequence homology to any existing protein. We thought, therefore, that the ''Never Born'' proteins generated and investigated by Tretyachenko ''et al.''<ref name="Tretyachenko">PMID:29133927</ref> would serve as a valuable benchmark for comparison. In their study, they experimentally showed that some ''Never Born'' proteins folded into compact structures, ''e.g.'', as seen for Sequences #1856 and #6387.
{|
{|
|-
|-
Line 18: Line 20:
!AF2-1856
!AF2-1856
|-
|-
-
|[[Image:1856_RTF_Morph_Tube_25_300_170_2sec.GIF|300px]]
+
|[[Image:1856_RTF_Morph_Tube_25_300_170_2sec_Crop.gif|189px]]
-
|[[Image:1856_ESM_25_300_177.jpg|300px]]
+
|[[Image:1856_ESM_25_300_177_Crop.jpg|161px]]
-
|[[Image:1856_AF2_Morph_Tube_25_300_170_2sec.GIF‎|300px]]
+
|[[Image:1856_AF2_Morph_Tube_25_300_170_2sec_Crop.gif‎|280px]]
|}
|}
{|
{|
Line 28: Line 30:
!AF2-6387
!AF2-6387
|-
|-
-
|[[Image:6387_RTF_Morph_Tube_25_60frames_2sec.gif|300px]]
+
|[[Image:6387_RTF_Morph_Tube_25_60frames_2sec_Crop.gif|189px]]
-
|[[Image:6387_ESM_Tube_300_177.jpg|300px]]
+
|[[Image:6387_ESM_Tube_300_177_Crop.jpg|161px]]
-
|[[Image:6387_AF2_Morph_Tube_25_300x270_59_frames.GIF‎|300px]]
+
|[[Image:6387_AF2_Morph_Tube_25_300x270_59_frames_Crop.gif‎|280px]]
|}
|}
Line 40: Line 42:
!AF2-3703
!AF2-3703
|-
|-
-
|[[Image:3703_RTF_Morph_Tube_35_300_270_2_Sec.GIF|300px]]
+
|[[Image:3703_RTF_Morph_Tube_35_300_270_2_Sec_Crop.gif|285px]]
-
|[[Image:ESM_Tube_35_212_178x.jpg|212px]]
+
|[[Image:ESM_Tube_35_212_178x_Crop.jpg|a good 161px]]
-
|[[Image:3703_AF2_Morph_Tube_sc35_300x170_2_sec.GIF‎|300px]]
+
|[[Image:3703_AF2_Morph_Tube_sc35_300x170_2_sec_Crop.gif‎|260px]]
|}
|}
-
We then went on to use the three algorithms on orphan proteins and taxonomically restricted gene products (TRGP) for which no experimental structures were available. We did this in order to see how the predictions of the three algorithms would compare, and whether they would predict novel folds. Although many ORFs have been identified which code for putative orphan proteins, only in a limited number of cases has their association with a well-defined biological activity been established. We have identified seven such proteins for which the necessary sequence data are also available. The number of amino acids for these seven orphans/TRGPs ranges from 109 to 632.
+
We then went on to use the three algorithms on orphan proteins and taxonomically restricted gene products (TRGP) for which no experimental structures were available. We did this in order to see how the predictions of the three algorithms would compare, and whether they would predict novel folds. Although many ORFs have been identified that code for putative orphan proteins, only in a limited number of cases has their association with a well-defined biological activity been established. We have identified seven such proteins for which the necessary sequence data are also available. The number of amino acids for these seven orphans/TRGPs ranges from 109 to 632.
-
 
+
-
As an initial step in characterizing these seven proteins, we utilized FoldIndex<ref name="FoldIndex">PMID:15955783 </ref> and flDPnn<ref name="flDPnn">PMID:34290238</ref> to investigate whether they were predicted to be intrinsically disordered proteins (IDP) or folded. Five of the proteins are predicted to be almost completely folded, while the other two, TaFROG and Newtic1, are classified as IDPs, since they are predicted to be disordered throughout almost their entire sequences. Of the seven proteins studied only HCO_011565, a 632 residue nematode protein that was shown to be the target of the nematodicidal small molecule, appears to be fully folded. The three algorithms predict almost identical structures as well as very high pLDDT scores. Most likely, this is for two reasons. Firstly, rather than being a true orphan, HCO_011565 is the product of a TRG<ref name="HCO_011565">PMID: 36313370</ref>, with the BLAST search having revealed that the first 74 homologous sequences, with the lowest E values, were all from nematodes. Secondly, the DALI server revealed a number of hits for the entire predicted structure, as well as for the three subdomains predicted by all three algorithms.
+
 +
As an initial step in characterizing these seven proteins, we utilized FoldIndex<ref name="FoldIndex">PMID:15955783 </ref> and flDPnn<ref name="flDPnn">PMID:34290238</ref> to investigate whether they were predicted to be intrinsically disordered proteins (IDP) or folded. Five of the proteins are predicted to be almost completely folded, while the other two, TaFROG and Newtic1, are classified as IDPs since they are predicted to be disordered throughout almost their entire sequences. Of the seven proteins studied, only HCO_011565, a 632 residue nematode protein that was shown to be the target of the nematodicidal small molecule, appears to be fully folded. The three algorithms predict almost identical structures as well as very high pLDDT scores. Most likely, this is for two reasons. Firstly, rather than being a true orphan, HCO_011565 is the product of a TRG<ref name="HCO_011565">PMID: 36313370</ref>, with the BLAST search has revealed that the first 74 homologous sequences, with the lowest E values, were all from nematodes. Secondly, the DALI server revealed a number of hits for the entire predicted structure, as well as for the three subdomains predicted by all three algorithms. It is striking just how these three different algorithms were able to predict virtually identical 3D models of this TRG, ''i.e.'', HCO_011565, with relatively high pLDDT scores for ESM and AF2, ''i.e.'', 86.2, 83.2, respectively. A 3D applet of a morph of the 5 top models of AF-2's prediction is shown to the right, and the predictions of all three algorithms are shown just below.
{|
{|
|-
|-
Line 55: Line 56:
!AF2-HCO_011565
!AF2-HCO_011565
|-
|-
-
|[[Image:HCO_RTF_TUBE_sc70_400x277.GIF|400px]]
+
|[[Image:HCO_RTF_TUBE_sc70_400x277_Crop_10.gif|214px]]
-
|[[Image:HCO_ESM_Tube_sc70_271_227.jpg|271px]]
+
|[[Image:HCO_ESM_Tube_sc70_271_227_Crop.jpg|160px]]
-
|[[Image:HCO_AF2_Morph_Tube_sc70_400x227.GIF|400px]]
+
|[[Image:HCO_AF2_Morph_Tube_sc70_400x227_Crop.gif|169px]]
|}
|}
-
An example of an Orphan protein that is predicted by all three algorithms to be an IDP, is the wheat protein TaFROG. It contains 130 amino acids localized in the nucleus. It confers resistance on wheat to the mycotoxigenic fungus, ''Fusarium graminearum''<ref name="TaFROG">PMID:26508775</ref>.
+
An example of an Orphan protein that is predicted by all three algorithms to be an IDP is the wheat protein TaFROG. It contains 130 amino acids localized in the nucleus. It confers resistance on wheat to the mycotoxigenic fungus, ''Fusarium graminearum''<ref name="TaFROG">PMID:26508775</ref>.
{|
{|
|-
|-
Line 66: Line 67:
!AF2-TaFROG
!AF2-TaFROG
|-
|-
-
|[[Image:TaFROG_RTF_Morph_Tube_Sc70_400x284.GIF|400px]]
+
|[[Image:TaFROG_RTF_Morph_Tube_Sc70_400x284_Crop.gif|214px]]
-
|[[Image:TaFROG_ESM_sc70_272_278.jpg|271px]]
+
|[[Image:TaFROG_ESM_sc70_272_278_Crop.jpg|160px]]
-
|[[Image:TaFROG_AF2_Morph_sc70_400x22.GIF|400px]]
+
|[[Image:TaFROG_AF2_Morph_sc70_400x22_Crop.gif|169px]]
|}
|}
-
Because the sequences of orphan proteins lack homology information, protein structure prediction for them has recently become a hot topic<ref name="Church">PMID:36192636</ref>. The approaches and methodologies that we have implemented in this study may provide a starting point for datasets and protocols to evaluate the performance of structure prediction algorithms on sequences that lack homology to other sequences. It has not escaped our notice that this topic may potentially have a large impact on our understanding of how new traits evolve from orphan proteins.
+
Although the pLDDT<ref name="pLDDT">PMID:23986568</ref> scores for most of the ''Never Born'' and ''Newly Born'' proteins are low, the overall conformation of the 3D structures that they predict appear to give very consistent correlations of a compact folded protein versus an IDP. Thus, the approach to using recently developed AI/Deep Learning tools to indicate the overall shape of these proteins is useful.
 +
 
 +
Because the sequences of orphan proteins lack homology information, protein structure prediction for them has recently become a hot topic<ref name="Church">PMID:36192636</ref>. The approaches and methodologies we have implemented in this study may provide a starting point for datasets and protocols to evaluate the performance of structure prediction algorithms on sequences that lack homology to other sequences. It has not escaped our notice that this topic may significantly impact our understanding of how new traits evolve from orphan proteins.
 +
 
== References ==
== References ==
<references/>
<references/>
 +
</StructureSection>
 +
__NOEDITSECTION__

Current revision

Drag the structure with the mouse to rotate

Proteopedia Page Contributors and Editors (what is this?)

Joel L. Sussman, Jaime Prilusky

This page complements a publication in scientific journals and is one of the Proteopedia's Interactive 3D Complement pages. For aditional details please see I3DC.
Personal tools