User:Eric Martz/AlphaFold3 case studies
From Proteopedia
(→Superpositon RMSD Summary) |
(→8JRP Chain A: Partially Untemplated) |
||
| (73 intermediate revisions not shown.) | |||
| Line 1: | Line 1: | ||
| - | <table style="background-color:#ffffc0;"><tr><td> | + | <!--<table style="background-color:#ffffc0;"><tr><td> |
This page is under construction. This notice will be removed when it is completed. [[User:Eric Martz|Eric Martz]] 02:03, 13 November 2024 (UTC) | This page is under construction. This notice will be removed when it is completed. [[User:Eric Martz|Eric Martz]] 02:03, 13 November 2024 (UTC) | ||
| - | </td></tr></table> | + | </td></tr></table>--> |
| + | '''Summary''': | ||
| + | <br>The purpose of these case studies was for me ([[User:Eric Martz]]) to gain experience using the [https://alphafoldserver.com AlphaFold3 Server]. See resulting [[How_to_predict_structures_with_AlphaFold#Prediction_Servers|guidance for using the AlphaFold3 server]]. | ||
| + | |||
| + | Comparisons were made between 3 empirical models and predictions by the [https://alphafoldserver.com AlphaFold3 Server] and the [https://alphafold.ebi.ac.uk AlphaFold2 Database]. Predictions by AlphaFold3 had '''high confidence''' (except for disordered regions), and '''high accuracy''' for naturally occurring sequences, including a 3-chain assembly with complex structural inter-dependencies ([[1a0r]]). In the two cases checked, the predictions of AlphaFold2 were slightly less accurate than those of AlphaFold3. | ||
| + | |||
| + | In the case of chain A of [[8jrp]], loop 397-423 is predicted to be disordered, but nevertheless has coordinates substantiated by the [[cryo-EM]] density map. AlphaFold3 had no confidence in its prediction of this segment, and placed it in a different position than in the empirical model. | ||
| + | |||
| + | One synthetic construct was analyzed ([[8k8g]]), for which '''no similar sequences or structures''' are in the PDB ''prior to 2024''<ref name="8k8g.uniprot">For the sequence in 8k8g, the top BLAST hit in UniProt is 37% identical for 79% of length.</ref>. AlphaFold3's stated training set excludes PDB entries after September, 2021. AlphaFold3 had '''no confidence''' in its prediction for the fold of this chain, which was very different from the empirical structure. The empirical model is a homodimer, but none of the 5 homodimers predicted by AlphaFold3 (with no confidence, ipTM 0.10-0.37) had inter-chain contacts similar to the that in the empirical structure. | ||
| + | |||
==AlphaFold3 Example 8AW3== | ==AlphaFold3 Example 8AW3== | ||
| Line 16: | Line 25: | ||
The AlphaFold3 prediction superposes (FATCAT rigid) with all 279 alpha carbons of chain 3 of [[8aw3]] with [[RMSD]] 2.2 Å. The morph shows the largest discrepancy in the red (no confidence) loop at center bottom, which has sequence SNSGCRKSNR (238-247 in the numbering of both UniProt and 8aw3). These residues have coordinates in 8aw3, but 3 residues, CRK, have incomplete sidechains. | The AlphaFold3 prediction superposes (FATCAT rigid) with all 279 alpha carbons of chain 3 of [[8aw3]] with [[RMSD]] 2.2 Å. The morph shows the largest discrepancy in the red (no confidence) loop at center bottom, which has sequence SNSGCRKSNR (238-247 in the numbering of both UniProt and 8aw3). These residues have coordinates in 8aw3, but 3 residues, CRK, have incomplete sidechains. | ||
| - | <table class="wikitable"><tr><td> | + | <table class="wikitable"><tr><td width="470"> |
[[Image:Q38107-af3.png]] | [[Image:Q38107-af3.png]] | ||
</td><td width="400"> | </td><td width="400"> | ||
[[Image:Af3-vs-8aw3-3-fatcat-rigid.gif]] | [[Image:Af3-vs-8aw3-3-fatcat-rigid.gif]] | ||
</td></tr><tr><td> | </td></tr><tr><td> | ||
| - | [https://alphafoldserver.com AlphaFold3] prediction for Q38107. | + | [https://alphafoldserver.com AlphaFold3] prediction for Q38107. [[pLDDT]] > 70 is confident. |
| + | [[Image:PLDDT-color-key.png|300px]] | ||
</td><td> | </td><td> | ||
FATCAT morph between AlphaFold3 prediction and chain 3 of [[8aw3]]. | FATCAT morph between AlphaFold3 prediction and chain 3 of [[8aw3]]. | ||
| Line 41: | Line 51: | ||
==8JRP Chain A: Partially Untemplated== | ==8JRP Chain A: Partially Untemplated== | ||
<table class="wikitable"><tr><td rowspan="2"> | <table class="wikitable"><tr><td rowspan="2"> | ||
| + | In November, 2024, when this analysis was done, the | ||
[https://alphafoldserver.com AlphaFold3 Server] FAQ says that when [[empirical models|empirical]] templates exist in the [[wwPDB]], they will be used unconditionally, adding that “… the server searches PDB for template structures with a cutoff date of 30th September 2021 …”. | [https://alphafoldserver.com AlphaFold3 Server] FAQ says that when [[empirical models|empirical]] templates exist in the [[wwPDB]], they will be used unconditionally, adding that “… the server searches PDB for template structures with a cutoff date of 30th September 2021 …”. | ||
| Line 97: | Line 108: | ||
====Five AlphaFold3 models==== | ====Five AlphaFold3 models==== | ||
| - | The AlphaFold3 server delivers five models. In this instance, four | + | The AlphaFold3 server delivers five models. In this instance, four all have pTM 0.77, and one (model_3.cif) had pTM 0.76. |
Superposing the latter with the empirical model gave RMSD 2.1 Å for all 689 alpha carbons in the empirical model. This is the same RMSD as obtained with the "top" model (model_0.cif). In conclusion, there was '''no significant difference '''between the abilities of these two of the five models to superpose with the empirical model. | Superposing the latter with the empirical model gave RMSD 2.1 Å for all 689 alpha carbons in the empirical model. This is the same RMSD as obtained with the "top" model (model_0.cif). In conclusion, there was '''no significant difference '''between the abilities of these two of the five models to superpose with the empirical model. | ||
| Line 103: | Line 114: | ||
A rigid superposition of the AlphaFold2 Database prediction for Q05086 with chain A of [[8jrp]] gave [[RMSD]] 3.0 Å. The main discrepancy, aside from that expected for the loop 397-423 (red, no confidence), was rotation of the C-terminal domain. Yet when submitted allowing twists, no improvement occurred. In conclusion, the AlphaFold3 prediction was better than the AlphaFold2 prediction. | A rigid superposition of the AlphaFold2 Database prediction for Q05086 with chain A of [[8jrp]] gave [[RMSD]] 3.0 Å. The main discrepancy, aside from that expected for the loop 397-423 (red, no confidence), was rotation of the C-terminal domain. Yet when submitted allowing twists, no improvement occurred. In conclusion, the AlphaFold3 prediction was better than the AlphaFold2 prediction. | ||
| - | == | + | ==1a0r: Three Sequence-Distinct Chains== |
| + | ===Empirical Model 1a0r=== | ||
| + | <table align="right" class="wikitable"><tr><td> | ||
| + | [[1a0r]] is an assembly of three sequence-distinct chains, and a highly inter-dependent structure. It was published in 1998 as an X-ray structure with resolution 2.8 Å. | ||
| + | *<font color="#50b050"><b>Chain B</b></font>, transducin beta, length 339, forms a compact seven-bladed beta-propeller, with the exception of its 48-residue N-terminus, which forms a helix wrapping against the beta-propeller. | ||
| + | *<font color="#d0a050"><b>Chain G</b></font>, transducin gamma, length 65 ([https://www.uniprot.org/uniprotkb/P02698/entry#sequences of 74]), is partially helical, and wraps around the beta-propeller, without any compact domain of its own. It pairs with the N-terminus of chain B. | ||
| + | *<font color="#c09090"><b>Chain P</b></font>, phosducin, length 245, forms a compact alpha/beta domain with its C-terminal 123 residues. Its ~95-residue N-terminus, partially helical, wraps against the beta-propeller, without contacting chain G or the N-terminus of chain B. This is the only chain with [[missing residues]]: 12 at the N terminus, a loop of 30 (38-67), and 15 at the C terminus. All missing residues are predicted to be in disordered segments: | ||
| + | |||
| + | [[Image:1a0r-chain-p-disorder-vs-unmodeled-.png]] | ||
| + | <br> | ||
| + | Simplified [https://www.rcsb.org/sequence/1A0R sequence annotations from RCSB.org]. | ||
| + | |||
| + | </td><td> | ||
| + | [[Image:1a0r-fgij.gif]] | ||
| + | </td></tr></table> | ||
| + | |||
| + | ===AlphaFold3 Prediction for 1a0r=== | ||
| + | |||
| + | <table align="right" class="wikitable"><tr><td rowspan=2> | ||
| + | |||
| + | The three sequences were submitted to AlphaFold3 as a single job. The job is templated, since 1a0r was published in 1998. There are a total of 592 amino acids in 1a0r. AlphaFold3 filled in the 57 missing residues, making a total of 649. | ||
| + | |||
| + | For the AlphaFold3 prediction, pTM was 0.86, and iPTM was 091, indicating good confidence overall. As expected, the disordered ends and loop were predicted with <font color="red"><b>no confidence (red)</b></font>. Average [[pLDDT]] for alpha carbons (obtained with [[FirstGlance/How to get average pLDDT from AlphaFold models|FirstGlance]]): | ||
| + | <br> | ||
| + | Key: <50, no confidence; 50-70 low confidence, caution; 70-90: confident; >90: high confidence. | ||
| + | * 88.4 for the 3-chain assembly | ||
| + | * 95.8 for transducin beta, length 340 | ||
| + | * 90.4 for transducin gamma, length 65 | ||
| + | * 77.6 for phosducin, length 245 (including disordered regions) | ||
| + | ** 88.0 without disordered regions (sequence ranges 13-37, 68-230) | ||
| + | ** 30.8 for disordered ends (27 residues) | ||
| + | ** 54.6 for the disordered loop (30 residues) | ||
| + | |||
| + | Since FATCAT superposes only pairs of chains, superposition of the 3-chain assemblies was done in [[PyMOL]], which gave RMSD '''1.8 Å''' for 522 (88%) of 592 alpha carbons. Optimally, after iterative rejection of the most separated atom pairs, 375 alpha carbons (63% of 592) superposed with RMSD '''0.63 Å'''. Pairs of individual chains: | ||
| + | * Chain B: 337 (99%) of 339 alpha carbons superposed with RMSD 1.0 Å. | ||
| + | * Chain G: 61 (94%) of 65 alpha carbons superposed with RMSD 0.5 Å. | ||
| + | * Chain P: 184 (98%) of 188 alpha carbons superposed with RMSD 1.8 Å. | ||
| + | Superpositions with [[ChimeraX]] gave similar RMSD values (see [[#Superposition RMSD Summary|Summary]]). | ||
| + | |||
| + | </td><td width="350"> | ||
| + | [[Image:Af3-chains-in-1a0r-licorice.gif]] | ||
| + | </td><td width="310"> | ||
| + | [[Image:1a0r-vs-af3-morph-licorice-far.gif]] | ||
| + | </td></tr><tr><td> | ||
| + | AlphaFold3 prediction for the 3 chains in [[1a0r]]. [[pLDDT]] >70 is confident. | ||
| + | [[Image:PLDDT-color-key.png|330px]] | ||
| + | </td><td> | ||
| + | [[ChimeraX]] morph between [[1a0r]] and the AlphaFold3 prediction. <font color="gray"><b>Farnesyl ligand (GRAY SPHERES)</b></font> was not available in AlphaFold3. Its absence accounts for the largest differences. | ||
| + | </td></tr></table> | ||
| + | |||
| + | ==Untemplated Synthetic Construct 8K8G== | ||
| + | |||
| + | <table align="right" class="wikitable"><tr><td rowspan=2> | ||
| + | [[8kck]] and [[8k8g]] are X-ray structures (resolutions 1.35 Å, worrisome Rfree values<ref name="8kck.rfree">8kck: Rfree worse than average. PDB-ReDo increases both R and Rfree! 8k8g: PDB-ReDo reduces R from 0.20 to 0.17, and Rfree from 0.222 ("unreliable") to 0.207 ("worse than average"). Characterizations "worse than average" and "unreliable" are from FirstGlance, which uses [http://firstglance.jmol.org/notes.htm#grading statistics from the PDB].</ref>) of a synthetic construct of 214 amino acids, with coordinates for 209 (missing coordinates for a C-terminal His tag). These entries have no journal publication, but indicate that the authors believe the biological assembly to be a homodimer. | ||
| + | |||
| + | These structures were deposited into the PDB in 2024. There are no previous entries with chains having closely similar structure, and none with similar sequences. Therefore, these structures appear to be '''untemplated for AlphaFold3''', which states the cutoff date of its training set to be 30th September 2021. | ||
| + | |||
| + | AlphaFold3 is '''unable to predict the fold of this chain with confidence''' (pTM 0.36, average [[pLDDT]] 42). The empirical structure has one 7-strand beta sheet and 6 helices. The Alphafold prediction is quite different, with one 5-strand beta sheet and one 2-strand sheet, plus 5 helices. | ||
| + | |||
| + | Similarly, given two chains, AlphaFold3's top ranked prediction had pTM 0.49, ipTM 0.37. It predicted no confidence in any of the 5 models it returned (not shown), neither in the fold, nor in the contact between the two chains. Indeed, none of the homodimer predictions were structurally close to the empirical models (not shown). | ||
| + | |||
| + | |||
| + | </td><td width="350"> | ||
| + | [[Image:8k8g-secondary-structure.gif]] | ||
| + | </td><td width="350"> | ||
| + | [[Image:Af3-for-8k8g.gif]] | ||
| + | </td></tr><tr><td> | ||
| + | [[8k8g]] monomer. | ||
| + | {{Template:ColorKey_Helix}}, | ||
| + | {{Template:ColorKey_Strand}}, | ||
| + | {{Template:ColorKey_Loop}}. | ||
| + | </td><td> | ||
| + | AlphaFold3 prediction for 1 copy of the sequence in [[8k8g]]. | ||
| + | [[pLDDT]] >70 is confident. | ||
| + | [[Image:PLDDT-color-key.png|330px]] | ||
| + | </td></tr></table> | ||
| + | |||
| + | ==Superposition RMSD Summary== | ||
| + | Each superposition in this table is discussed above. | ||
| + | |||
<table class="wikitable"><tr><th rowspan=2> | <table class="wikitable"><tr><th rowspan=2> | ||
Model 1 | Model 1 | ||
| Line 109: | Line 199: | ||
Model 2 | Model 2 | ||
</th><th colspan=4> | </th><th colspan=4> | ||
| - | + | Superposition RMSD (Superposed αC/Total αC) | |
</th></tr><tr><th> | </th></tr><tr><th> | ||
FATCAT Rigid | FATCAT Rigid | ||
| Line 115: | Line 205: | ||
FATCAT Flexible | FATCAT Flexible | ||
</th><th> | </th><th> | ||
| - | + | PyMOL<sup>b</sup> | |
</th><th> | </th><th> | ||
| - | + | ChimeraX<sup>c</sup> | |
</th></tr><tr><td> | </th></tr><tr><td> | ||
[[8aw3]] chain 3 tRNA deaminase | [[8aw3]] chain 3 tRNA deaminase | ||
| Line 158: | Line 248: | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
| + | 1.9 Å (636/689, 92%) | ||
| + | <br> | ||
| + | 1.0 Å (540/689, 78%) | ||
</td><td> | </td><td> | ||
| + | |||
</td></tr><tr><td> | </td></tr><tr><td> | ||
| + | [[8jrp]] chain A ubiquitin ligase | ||
</td><td> | </td><td> | ||
| + | AF3 Q05086 Model 3 | ||
</td><td> | </td><td> | ||
| + | 2.1 Å (all 689) | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
| + | |||
</td></tr><tr><td> | </td></tr><tr><td> | ||
| + | 8jrp chain A 120-519 untemplated? | ||
</td><td> | </td><td> | ||
| + | AF3 Q05086 | ||
</td><td> | </td><td> | ||
| + | 0.9 Å (all 302) | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
| + | |||
</td></tr><tr><td> | </td></tr><tr><td> | ||
| + | 8jrp chain A 520-869 templated | ||
</td><td> | </td><td> | ||
| + | AF3 Q05086 | ||
</td><td> | </td><td> | ||
| + | 1.3 Å (all 350) | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
| + | |||
</td></tr><tr><td> | </td></tr><tr><td> | ||
| + | [[8jrp]] chain A ubiquitin ligase | ||
</td><td> | </td><td> | ||
| + | AF2 DB Q05086 | ||
</td><td> | </td><td> | ||
| + | 3.0 Å | ||
</td><td> | </td><td> | ||
| + | (no improvement) | ||
</td><td> | </td><td> | ||
</td><td> | </td><td> | ||
| + | |||
| + | </td></tr><tr><td> | ||
| + | [[1a0r]] 3-chain assembly | ||
| + | </td><td> | ||
| + | AF3 [https://www.uniprot.org/uniprotkb/P62871/entry#sequences P62871] + [https://www.uniprot.org/uniprotkb/P02698/entry#sequences P02698] + | ||
| + | <br> | ||
| + | [https://www.uniprot.org/uniprotkb/P19632/entry#sequences P19632] | ||
| + | </td><td> | ||
| + | </td><td> | ||
| + | </td><td> | ||
| + | 1.8 Å (522/592, 88%) | ||
| + | <br> | ||
| + | 0.63 Å (375/592, 63%) | ||
| + | </td><td> | ||
| + | 1.6 Å (all 592) | ||
| + | <br> | ||
| + | 0.8 Å (482/592, 81%) | ||
| + | <br> | ||
| + | 0.5 Å (379/592, 64%) | ||
| + | |||
</td></tr></table> | </td></tr></table> | ||
| - | AF3 = [https://alphafoldserver.com AlphaFold3 Server] | + | AF3 = [https://alphafoldserver.com AlphaFold3 Server]. Model 0 (best of five, 0-4) was used unless otherwise noted. |
<br> | <br> | ||
AF2 DB = [https://alphafold.ebi.ac.uk AlphaFold2 DataBase] | AF2 DB = [https://alphafold.ebi.ac.uk AlphaFold2 DataBase] | ||
<br> | <br> | ||
Note a: Disordered loop 57-108 deleted from both models. | Note a: Disordered loop 57-108 deleted from both models. | ||
| + | <br> | ||
| + | Note b: PyMOL super command, limited to alpha carbons, e.g. ''super empirical////ca, af3////ca''. | ||
| + | <br> | ||
| + | Note c: ChimeraX Tools, Structure Analysis, MatchMaker. Default "pruning" eliminated pairs farther than 2.0 Å apart. Also did a cutoff of 1.0 Å. | ||
==Methods== | ==Methods== | ||
| - | These case studies of predictions by the [https://alphafoldserver.com AlphaFold3 Server] were done in November 2024. At that time, the [https://alphafold.ebi.ac.uk/ AlphaFold Database] had been generated with AlphaFold2. Superpositions and RMSD values for pairs of chains were obtained from [https://fatcat.godziklab.org/ FATCAT]. Superpositions and RMSD values for multiple-chain assemblies were obtained from [[PyMOL]] with its ''super'' command | + | These case studies of predictions by the [https://alphafoldserver.com AlphaFold3 Server] were done in November 2024. At that time, the [https://alphafold.ebi.ac.uk/ AlphaFold Database] had been generated with AlphaFold2. Superpositions and RMSD values for pairs of chains were obtained from [https://fatcat.godziklab.org/ FATCAT]. Superpositions and RMSD values for '''multiple-chain assemblies''' were obtained from [[PyMOL]] with its ''super'' command using default settings, and/or with [[ChimeraX]] using its Tools, Structure Analysis, MatchMaker, since FATCAT and the [https://www.rcsb.org/alignment RCSB Superposition utility] are limited to pairs of chains. Rocking animations were generated by [http://firstglance.jmol.org FirstGlance in Jmol], and morphs were [http://firstglance.jmol.org/videocapture.htm captured] from FATCAT. [[pLDDT]] averages for specified sequence ranges [[FirstGlance/How to get average pLDDT from AlphaFold models|were obtained using FirstGlance in Jmol]], after [[Converting AlphaFold3 CIF to PDB|converting predicted .cif files to .pdb format]]. |
==See Also== | ==See Also== | ||
Current revision
Summary:
The purpose of these case studies was for me (User:Eric Martz) to gain experience using the AlphaFold3 Server. See resulting guidance for using the AlphaFold3 server.
Comparisons were made between 3 empirical models and predictions by the AlphaFold3 Server and the AlphaFold2 Database. Predictions by AlphaFold3 had high confidence (except for disordered regions), and high accuracy for naturally occurring sequences, including a 3-chain assembly with complex structural inter-dependencies (1a0r). In the two cases checked, the predictions of AlphaFold2 were slightly less accurate than those of AlphaFold3.
In the case of chain A of 8jrp, loop 397-423 is predicted to be disordered, but nevertheless has coordinates substantiated by the cryo-EM density map. AlphaFold3 had no confidence in its prediction of this segment, and placed it in a different position than in the empirical model.
One synthetic construct was analyzed (8k8g), for which no similar sequences or structures are in the PDB prior to 2024[1]. AlphaFold3's stated training set excludes PDB entries after September, 2021. AlphaFold3 had no confidence in its prediction for the fold of this chain, which was very different from the empirical structure. The empirical model is a homodimer, but none of the 5 homodimers predicted by AlphaFold3 (with no confidence, ipTM 0.10-0.37) had inter-chain contacts similar to the that in the empirical structure.
Contents |
AlphaFold3 Example 8AW3
Summary: This is one of three examples featured in the AlphaFold3 Server. AlphaFold3 predicted the fold of the mid-341 residues of tRNA A34 deaminase (UniProt Q381Q7) accurately, when compared to the empirical model in 8aw3. It made no confident predictions for what appear to be 90 residues in two disordered loops, and a 25-residue disordered C terminus. In contrast, the prediction from the AlphaFold2 Database was bent relative to the empirical model, but otherwise largely accurate.
UniProt Q381Q7 (tRNA A34 deaminase) is the longest chain in the example 8aw3 provided by the AlphaFold3 Server. In 8aw3, its chain name is "3". 8aw3 is a cryo-EM structure with resolution 3.6 Å. 8aw3 indicates that the experimental material included full-length chain 3 of 369 residues, but 28 end-residues lack coordinates, so the coordinates run from UniProt 4-344, length 341. Within this range, there are two loops lacking coordinates of lengths 49 and 13, leaving 279 amino acids with coordinates. Neither missing loop is predicted to be disordered at UniProt, although the estimated disorder propensity is high for both.
Q381Q7: AlphaFold3 vs Empirical Structure
The overall 4-chain prediction (for the complex in 8aw3) has pTM of 0.75, at the low end of the range 0.7-0.9 deemed "confident" by the server. Both of the missing loops have "no-confidence" (red) coordinates. The longer one is modeled as a large circle protruding from the surface of the protein, typical for loops with intrinsic disorder. The average pLDDT for chain 3 is 68 (weak confidence), but with the large disordered loop removed, this increases to 74[2].
The AlphaFold3 prediction superposes (FATCAT rigid) with all 279 alpha carbons of chain 3 of 8aw3 with RMSD 2.2 Å. The morph shows the largest discrepancy in the red (no confidence) loop at center bottom, which has sequence SNSGCRKSNR (238-247 in the numbering of both UniProt and 8aw3). These residues have coordinates in 8aw3, but 3 residues, CRK, have incomplete sidechains.
|
AlphaFold3 prediction for Q38107. pLDDT > 70 is confident.
|
FATCAT morph between AlphaFold3 prediction and chain 3 of 8aw3. |
Q381Q7: AlphaFold3 vs AlphaFold2 Database
|
The long disordered loop was in different positions in the AlphaFold3 vs. AlphaFold2 Database models. When it was included, FATCAT superposed it between models, causing the RMSD to be 4.8-4.9 Å (rigid or flexible superposition). Deleting that loop (57-108) from the models resulted in a much better flexible superposition RMSD value of 2.0 Å for 312 alpha carbons (98% of the 317 present). This close superposition required flexibility. The rigid FATCAT superposition still had RMSD 4.5 Å. The morph shows bending and twisting between the two models. Since the AlphaFold2 Database prediction is bent/twisted relative to the AlphaFold3 prediction, and since the latter superposes well with the empirical model in a rigid superposition, the AlphaFold2 Database prediction should not superpose as well with the empirical model in a rigid superposition. Indeed, the rigid superposition has RMSD 4.1 Å for 270 (of 279) alpha carbons (morph not shown). (Flexible superposition achieved RMSD 2.4 Å.) | |
|
FATCAT morph between AlphaFold3 prediction and AlphaFold2 Database prediction for Q381Q7. |
8JRP Chain A: Partially Untemplated
|
In November, 2024, when this analysis was done, the AlphaFold3 Server FAQ says that when empirical templates exist in the wwPDB, they will be used unconditionally, adding that “… the server searches PDB for template structures with a cutoff date of 30th September 2021 …”. According to this, the N-terminal half of chain A of 8jrp[3] (120-519) should be untemplated, since a sequence search at RCSB finds no matching entries prior to 2023. Being untemplated should make structure prediction more challenging. Of course, a structure need not be sequence-identical to serve as a template. A search for similar chain structures finds no good matches (top hits "not significantly similar" according to FATCAT), consistent with this N-terminal half being untemplated. The C-terminal half (520-869) is templated by 2 entries, e.g. 1c4z[4], a 1999 X-ray 2.6 Å resolution structure. | |
|
Chain A of 8jrp: putatively untemplated half, templated half. |
Missing/disordered
N-terminal residues 1-119, missing in 8jrp, are predicted to be partially disordered. Loop 170-230, missing from 8jrp, is predicted to be disordered at RCSB. Sequence range 388-424 is predicted to be disordered, but has coordinates in 8jrp.
RCSB sequence graphic (simplified) for chain A of 8jrp. Gray bars are missing residues. *Brown bars are residues missing atoms in their sidechains. |
AlphaFold 3 prediction
Chain A in 8jrp is human ubiquitin-protein ligase E3A. The full-length sequence, Q05086, has 875 amino acids. The 3.6 Å resolution cryo-EM structure indicates that the full-length protein was imaged, but the N-terminal end of the model lacks coordinates for 119 residues. The ends of the model are sequence numbers 120 and 869 (length 750), but one loop of length 61 (170-230) is missing coordinates, leaving 689 amino acids with coordinates. Conveniently, the sequence numbers in UniProt and 8jrp are the same.
|
Chain A of 8jrp. Missing ends and loops shown as "empty baskets" by FirstGlance in Jmol.
|
AlphaFold3 prediction for Q05086[5]. pLDDT > 70 is confident.
|
Morph of FATCAT rigid superposition between AlphaFold3 prediction and chain A of 8jrp, RMSD 2.1 Å. |
AlphaFold3 was unable to predict with confidence the loop 170-230 (length 61) missing in 8jrp, and predicted to be disordered. The average pLDDT for the predicted loop was 40.2 (no confidence). Similarly, the N-terminal residues 1-119 missing from 8jrp were predicted with average pLDDT of 55.5 (no confidence). The 27-residue loop 397-423, predicted to be disordered but with coordinates in 8jrp, appears red in the predicted model with average pLDDT 47.1 (no confidence). In contrast, the average pLDDT for the residues having coordinates in 8jrp (120-169, 231-869) is 86.1 (high confidence).
The AlphaFold3 prediction superposed 688/689 alpha carbons onto chain A of 8jrp with RMSD 2.1 Å. Thus, the prediction is very close to the empirical model. The above morph shows that the largest discrepancy between the prediction and the empirical model occurred in the 27-residue loop 397-423, predicted to be disordered. These residues have coordinates in 8jrp, and the EM density map shows strong evidence for their positioning in the empirical model.
|
EM density for residues 388-424 in 8jrp. |
Templated vs. Untemplated
The apparently untemplated coordinates in chain A of 8jrp are 120-519. This range is missing coordinates for the 61-residue disordered loop. After removing coordinates for the 37-residue loop that is predicted to be disordered, but which has coordinates, there are 302 residues remaining. Superimposing these 302 alpha carbons with the AlphaFold3 prediction gave RMSD 0.9 Å. Thus, the prediction for the putatively untemplated half was excellent.
The templated coordinates are 520-869, length 350. Superposition gave RMSD 1.3 Å. In conclusion, the putatively untemplated half was predicted as well as the templated half.
Five AlphaFold3 models
The AlphaFold3 server delivers five models. In this instance, four all have pTM 0.77, and one (model_3.cif) had pTM 0.76. Superposing the latter with the empirical model gave RMSD 2.1 Å for all 689 alpha carbons in the empirical model. This is the same RMSD as obtained with the "top" model (model_0.cif). In conclusion, there was no significant difference between the abilities of these two of the five models to superpose with the empirical model.
AlphaFold2 Database prediction
A rigid superposition of the AlphaFold2 Database prediction for Q05086 with chain A of 8jrp gave RMSD 3.0 Å. The main discrepancy, aside from that expected for the loop 397-423 (red, no confidence), was rotation of the C-terminal domain. Yet when submitted allowing twists, no improvement occurred. In conclusion, the AlphaFold3 prediction was better than the AlphaFold2 prediction.
1a0r: Three Sequence-Distinct Chains
Empirical Model 1a0r
|
1a0r is an assembly of three sequence-distinct chains, and a highly inter-dependent structure. It was published in 1998 as an X-ray structure with resolution 2.8 Å.
|
AlphaFold3 Prediction for 1a0r
|
The three sequences were submitted to AlphaFold3 as a single job. The job is templated, since 1a0r was published in 1998. There are a total of 592 amino acids in 1a0r. AlphaFold3 filled in the 57 missing residues, making a total of 649. For the AlphaFold3 prediction, pTM was 0.86, and iPTM was 091, indicating good confidence overall. As expected, the disordered ends and loop were predicted with no confidence (red). Average pLDDT for alpha carbons (obtained with FirstGlance):
Since FATCAT superposes only pairs of chains, superposition of the 3-chain assemblies was done in PyMOL, which gave RMSD 1.8 Å for 522 (88%) of 592 alpha carbons. Optimally, after iterative rejection of the most separated atom pairs, 375 alpha carbons (63% of 592) superposed with RMSD 0.63 Å. Pairs of individual chains:
Superpositions with ChimeraX gave similar RMSD values (see Summary). | ||
|
AlphaFold3 prediction for the 3 chains in 1a0r. pLDDT >70 is confident.
|
ChimeraX morph between 1a0r and the AlphaFold3 prediction. Farnesyl ligand (GRAY SPHERES) was not available in AlphaFold3. Its absence accounts for the largest differences. |
Untemplated Synthetic Construct 8K8G
|
8kck and 8k8g are X-ray structures (resolutions 1.35 Å, worrisome Rfree values[6]) of a synthetic construct of 214 amino acids, with coordinates for 209 (missing coordinates for a C-terminal His tag). These entries have no journal publication, but indicate that the authors believe the biological assembly to be a homodimer. These structures were deposited into the PDB in 2024. There are no previous entries with chains having closely similar structure, and none with similar sequences. Therefore, these structures appear to be untemplated for AlphaFold3, which states the cutoff date of its training set to be 30th September 2021. AlphaFold3 is unable to predict the fold of this chain with confidence (pTM 0.36, average pLDDT 42). The empirical structure has one 7-strand beta sheet and 6 helices. The Alphafold prediction is quite different, with one 5-strand beta sheet and one 2-strand sheet, plus 5 helices. Similarly, given two chains, AlphaFold3's top ranked prediction had pTM 0.49, ipTM 0.37. It predicted no confidence in any of the 5 models it returned (not shown), neither in the fold, nor in the contact between the two chains. Indeed, none of the homodimer predictions were structurally close to the empirical models (not shown).
| ||
|
8k8g monomer. Alpha Helices, Beta Strands , Loops . |
AlphaFold3 prediction for 1 copy of the sequence in 8k8g.
pLDDT >70 is confident.
|
Superposition RMSD Summary
Each superposition in this table is discussed above.
|
Model 1 |
Model 2 |
Superposition RMSD (Superposed αC/Total αC) | |||
|---|---|---|---|---|---|
|
FATCAT Rigid |
FATCAT Flexible |
PyMOLb |
ChimeraXc | ||
|
8aw3 chain 3 tRNA deaminase |
AF3 Q381Q7 |
2.2 Å (all 279) | |||
|
8aw3 chain 3 tRNA deaminase |
AF2 DB Q381Q7 |
4.1 Å (270/279) |
2.4 Å | ||
|
AF3 Q381Q7a |
AF2 DB Q381Q7a |
4.5 Å |
2.0 Å (312/317) | ||
|
8jrp chain A ubiquitin ligase |
AF3 Q05086 |
2.1 Å (688/689) |
1.9 Å (636/689, 92%)
| ||
|
8jrp chain A ubiquitin ligase |
AF3 Q05086 Model 3 |
2.1 Å (all 689) | |||
|
8jrp chain A 120-519 untemplated? |
AF3 Q05086 |
0.9 Å (all 302) | |||
|
8jrp chain A 520-869 templated |
AF3 Q05086 |
1.3 Å (all 350) | |||
|
8jrp chain A ubiquitin ligase |
AF2 DB Q05086 |
3.0 Å |
(no improvement) | ||
|
1a0r 3-chain assembly |
1.8 Å (522/592, 88%)
|
1.6 Å (all 592)
| |||
AF3 = AlphaFold3 Server. Model 0 (best of five, 0-4) was used unless otherwise noted.
AF2 DB = AlphaFold2 DataBase
Note a: Disordered loop 57-108 deleted from both models.
Note b: PyMOL super command, limited to alpha carbons, e.g. super empirical////ca, af3////ca.
Note c: ChimeraX Tools, Structure Analysis, MatchMaker. Default "pruning" eliminated pairs farther than 2.0 Å apart. Also did a cutoff of 1.0 Å.
Methods
These case studies of predictions by the AlphaFold3 Server were done in November 2024. At that time, the AlphaFold Database had been generated with AlphaFold2. Superpositions and RMSD values for pairs of chains were obtained from FATCAT. Superpositions and RMSD values for multiple-chain assemblies were obtained from PyMOL with its super command using default settings, and/or with ChimeraX using its Tools, Structure Analysis, MatchMaker, since FATCAT and the RCSB Superposition utility are limited to pairs of chains. Rocking animations were generated by FirstGlance in Jmol, and morphs were captured from FATCAT. pLDDT averages for specified sequence ranges were obtained using FirstGlance in Jmol, after converting predicted .cif files to .pdb format.
See Also
- AlphaFold3 Server alphafoldserver.com
- How to predict structures with AlphaFold
- AlphaFold/Index lists pages in Proteopedia that relate to AlphaFold.
Notes
- ↑ For the sequence in 8k8g, the top BLAST hit in UniProt is 37% identical for 79% of length.
- ↑ Average pLDDT values were obtained with FirstGlance in Jmol, which reports the minimum, average, and maximum in the upper right as "Reliability".
- ↑ 8jrp was chosen despite its poor resolution of 3.6 Å because the better-resolution 8jrn, 2.6 Å, is not available in PDB format. A rigid superposition by FATCAT matched all 689 residues with coordinates, giving RMSD 0.7 Å, a near perfect match.
- ↑ A FATCAT rigid superposition of 8jrp chain A with 1cz4 chain C shows a twist between the two domains. A FATCAT flexible superposition with one twist gives RMSD 1.3 Å for all 350 alpha carbons. Thus, 1c4z would be an excellent template for predicting the C-terminal half of 8jrp chain A.
- ↑ Predictions were done for the full length Q05086 sequence (shown above), and with 1-119 removed. Rigid FATCAT superposition between the two predictions (only the 689 alpha carbons present in 8jrp) gave RMSD 1.4 Å. Thus, inclusion of 1-119 had little if any effect on the prediction of the remainder of the structure.
- ↑ 8kck: Rfree worse than average. PDB-ReDo increases both R and Rfree! 8k8g: PDB-ReDo reduces R from 0.20 to 0.17, and Rfree from 0.222 ("unreliable") to 0.207 ("worse than average"). Characterizations "worse than average" and "unreliable" are from FirstGlance, which uses statistics from the PDB.















