How to find a protein's best structure

From Proteopedia

(Difference between revisions)

Revision as of 20:46, 20 October 2024

Here is a general guide to finding a structure for a protein molecule of interest. This procedure is one of many possible. It is the one favored by User:Eric Martz.

1 Empirical Models
- 1.1 Is there an empirical model?
  - 1.1.1 Simple search for empirical models (via PIR)
  - 1.1.2 Advanced search for empirical models (RCSB PDB)
- 1.2 Has AlphaFold predicted a model?

Empirical Models

Empirical models are structures determined empirically (experimentally) by X-ray crystallography, cryo-Electron Microscopy, solution NMR. Empirical models are usually the most accurate and reliable, especially when they have good resolution. All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the World Wide Protein Data Bank (the "PDB").

Each model in the PDB has a unique 4-character identification code (PDB ID) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.

Here are two methods for finding out if your query amino acid sequence, or parts of it, have empirically-determined 3D structures in the PDB.

Is there an empirical model?

Empirically-determined models are usually the most reliable.

Simple search for empirical models (via PIR)

At UniProt.Org, find your protein and click on Structure (blue button at the left).

If there is a section 3D Structure Databases with a column labeled PDB entry containing 4-character PDB IDs, these are empirical structures for your protein. Pay attention to the “Positions” column, which gives the sequence number range covered by each model.
- To explore one of these models, write down its 4-character PDB code. Then see #How To Explore 3D Models below.
If there is no “PDB entry” column, then there are no sequence-identical empirical structures for your protein. Then try the Advanced search method below.
Some proteins have no Structure section (e.g. K4QDG1_SACBA). Then try the Advanced search method below.

If empirical structures exist, see #How To Explore 3D Models below. If they are satisfactory, then you don't need a homology model.

Advanced search for empirical models (RCSB PDB)

This method takes more time but gives you more information. It will find empirical structures that have sequence similarity to the query. Such hits enable a high-quality homology model.

For example, if your query is calmodulin from the lancelet fish (Q9UB37, CALM2_BRALA), zero empirical structures are listed at UniProt. However, the query is 97% sequence identical to human calmodulin (P62158 CALM_HUMAN) and calmodulins from other taxa, for which there are numerous full-length empirical structures. A very high quality homology model can be constructed.

Advanced search procedure:

Copy the FASTA format sequence for your protein, for example, from UniProt.Org.
Note the length of your sequence.
At rcsb.org, go to Advanced Search.
Select Sequence under 'Advanced Search Query Builder'.
Paste your query sequence into the box.
Push the button to run the search.
Scroll down to see the list of hits.
At the top of the list, change Display Results as to Polymer Entities. Then push again. This is crucial because it displays the identity percentages and alignments for the hits. It should be the default!
The best hits will be listed first. Notice that each hit starts with a large, bold PDB ID.

For each hit, notice the Sequence Identity % above the sequence alignment box.

Also notice the Region range, which tell you how many of your query residues align with the hit. Compare this to the full length of your query sequence.

If you click the Download button in the list of hits, you will get the CIF file. If you need PDB file format, click on the PDB ID code and open the Download menu on that single entry page to get all format options.

Has AlphaFold predicted a model?

Empirical models are the most reliable, but if none are available, AlphaFold has an impressive track record of correctly predicting structures from sequence. Check the AlphaFold Database for a model of your protein of interest. You can also submit a sequence and get a prediction: How to predict structures with AlphaFold. Another model prediction service with a good track record is RoseTTaFold. Submit your sequence there, making sure to check RoseTTaFold as the method. With any of these methods, download the predicted PDB file and then upload it to FirstGlance in Jmol for exploration and analysis. FirstGlance automatically colors predicted models by reliability.

Proteopedia Page Contributors and Editors (what is this?)

Eric Martz

Retrieved from "http://52.214.119.220/wiki/index.php/How_to_find_a_protein%27s_best_structure"

@@ Line 1: / Line 1: @@
-== Do you need a homology model? ==
+Here is a general guide to finding a structure for a protein molecule of interest. This procedure is one of many possible. It is the one favored by [[User:Eric Martz]].
-You don’t need a homology model if the amino acid sequence of interest (the query sequence) already has an empirically determined 3D structure. Structures determined empirically, by X-ray crystallography or (much less often) by solution NMR or cryo-EM, will almost always be more accurate than a homology model.
+== Empirical Models ==
-If [[AlphaFold]] has predicted a model for your amino acid sequence of interest, it will often be more accurate than a homology model, and in most cases, a homology model won't be possible due to lack of a suitable template.
+[[Empirical models]] are structures determined empirically (experimentally) by [[X-ray crystallography]], [[cryo-Electron Microscopy]], [[NMR|solution NMR]]. Empirical models are usually the most accurate and reliable, especially when they have good [[resolution]]. All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[World Wide Protein Data Bank]] (the "PDB").
-=== Has AlphaFold predicted a model? ===
+Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.
-Empirical models are the most reliable, but if none are available, [[AlphaFold]] has an impressive track record of correctly predicting structures from sequence. Check the [http://alphafold.ebi.ac.uk AlphaFold Database] for a model of your protein of interest. You can also submit a sequence and get a prediction: [[How to predict structures with AlphaFold]]. Another model prediction service with a good track record is [http://robetta.bakerlab.org RoseTTaFold]. Submit your sequence there, making sure to check ''RoseTTaFold'' as the method. With any of these methods, download the predicted [[PDB file]] and then upload it to [http://firstglance.jmol.org FirstGlance in Jmol] for exploration and analysis. FirstGlance automatically colors predicted models by reliability.
-=== Is there an empirical model? ===
+Here are two methods for finding out if your query amino acid sequence, or parts of it, have [[Empirical models|empirically-determined 3D structures]] in the [[PDB]].
-[[Empirical models|Empirically-determined]] models are usually the most reliable. All published, empirically-determined, atomic-resolution, macromolecular 3D structures are available in the [[World Wide Protein Data Bank]].
-Each model in the PDB has a unique 4-character identification code ([[PDB ID]]) that begins with a numeral, and has letters or numerals for the last 3 characters . Examples are 1d66, 4mdh, 9ins.
-Here are two methods for finding out if your query amino acid sequence, or parts of it, have [[Empirical models|empirically-determined 3D structures]] in the PDB.
+=== Is there an empirical model? ===
+[[Empirical models|Empirically-determined]] models are usually the most reliable.
 ==== Simple search for empirical models (via PIR) ====
@@ Line 56: / Line 55: @@
 The 18 residues marked X were not included in the identity calculation. In contrast, when the same sequence search is performed at [http://www.ebi.ac.uk/pdbe PDB-Europe], 100% sequence identity is reported. However, other aspects of the report at PDB-Europe are less satisfactory (e.g. the length of the alignment is not stated; the sequences are not numbered) and hence we recommend using rcsb.org despite its misleading sequence identity percentages.-->
+=== Has AlphaFold predicted a model? ===
+Empirical models are the most reliable, but if none are available, [[AlphaFold]] has an impressive track record of correctly predicting structures from sequence. Check the [http://alphafold.ebi.ac.uk AlphaFold Database] for a model of your protein of interest. You can also submit a sequence and get a prediction: [[How to predict structures with AlphaFold]]. Another model prediction service with a good track record is [http://robetta.bakerlab.org RoseTTaFold]. Submit your sequence there, making sure to check ''RoseTTaFold'' as the method. With any of these methods, download the predicted [[PDB file]] and then upload it to [http://firstglance.jmol.org FirstGlance in Jmol] for exploration and analysis. FirstGlance automatically colors predicted models by reliability.

How to find a protein's best structure

From Proteopedia

Revision as of 20:46, 20 October 2024

Contents

Empirical Models

Is there an empirical model?

Simple search for empirical models (via PIR)

Advanced search for empirical models (RCSB PDB)

Has AlphaFold predicted a model?

Proteopedia Page Contributors and Editors (what is this?)

Views

Personal tools

Navigation

Search

Toolbox