User:Wayne Decatur/Sequence analysis tools

From Proteopedia

< User:Wayne Decatur(Difference between revisions)
Jump to: navigation, search
m (Python-based utilities)
Current revision (19:45, 11 January 2021) (edit) (undo)
m (fix wikitext)
 
(25 intermediate revisions not shown.)
Line 22: Line 22:
* [http://biit.cs.ut.ee/gprofiler/ ProViz] - a web-based visualization tool to investigate the functional and evolutionary features of protein sequences.
* [http://biit.cs.ut.ee/gprofiler/ ProViz] - a web-based visualization tool to investigate the functional and evolutionary features of protein sequences.
* [http://prody.csb.pitt.edu/index.html ProDy Project] - "ProDy is a free and open-source Python package for protein structural dynamics analysis". Looks like it does protein sequence analysis too and working with PDB files.
* [http://prody.csb.pitt.edu/index.html ProDy Project] - "ProDy is a free and open-source Python package for protein structural dynamics analysis". Looks like it does protein sequence analysis too and working with PDB files.
 +
 +
==Aligning==
 +
* [https://github.com/fomightez/msucle-binder Muscle-binder - Launchable Jupyter environment for running command line-based Muscle via Binder.]. That page also links to the main MUSCLE resources there.
 +
 +
 +
==BLAST+==
 +
* [https://github.com/fomightez/blast-binder Blast-binder - Launchable Jupyter environment for running command line-based BLAST via Binder.]. That page also links to the main BLAST resources there. The launched notebooks illustrate ways to easily work with the output in Python.
 +
 +
 +
==Circos==
 +
* [https://github.com/fomightez/circos-binder Circos on Jupyter] - Circos in your browser-based Jupyter enviroment served from MyBinder.org. Circos so it is actively available in a browser with one click to launch Jupyter environment for Circos via Binder. That page also links to the main Circos resources there. The launched notebooks illustrate ways to easily work with the output in Python.
 +
==Converters==
==Converters==
Line 27: Line 39:
* [http://sequenceconversion.bugaco.com/converter/biology/sequences/clustal_to_fasta.php Sequence conversion Provided by bugaco.com] - a lot of conversion choices with easy interface. When I had interleaved clustal format it converted nicely to a straight fasta listing for the sequence for every organism.
* [http://sequenceconversion.bugaco.com/converter/biology/sequences/clustal_to_fasta.php Sequence conversion Provided by bugaco.com] - a lot of conversion choices with easy interface. When I had interleaved clustal format it converted nicely to a straight fasta listing for the sequence for every organism.
* [http://toolkit.tuebingen.mpg.de/reformat Reformat utility of Max Planck Institute for Developmental Biology Bioinformatics Toolkit] converts sequences or multiple sequence alignments to various forms.
* [http://toolkit.tuebingen.mpg.de/reformat Reformat utility of Max Planck Institute for Developmental Biology Bioinformatics Toolkit] converts sequences or multiple sequence alignments to various forms.
 +
* [https://www.hiv.lanl.gov/content/sequence/FORMAT_CONVERSION/form.html?sample_input=1 Format Converter] - converts nucleotide and protein sequences in various formats to a lot of other formats.
* [http://bioinformatics.org/sms2/three_to_one.html Three to One] converts three letter amino acid sequence translations to single letter translations.
* [http://bioinformatics.org/sms2/three_to_one.html Three to One] converts three letter amino acid sequence translations to single letter translations.
* [http://bioinformatics.org/sms2/one_to_three.html One to Three] converts single letter amino acid sequence translations to three letter translations.
* [http://bioinformatics.org/sms2/one_to_three.html One to Three] converts single letter amino acid sequence translations to three letter translations.
Line 32: Line 45:
* [http://biit.cs.ut.ee/gprofiler/gconvert.cgi g:Convert] - Gene ID Converter. Handles yeast and a very large list of other organisms.
* [http://biit.cs.ut.ee/gprofiler/gconvert.cgi g:Convert] - Gene ID Converter. Handles yeast and a very large list of other organisms.
* [https://github.com/fhcrc/seqmagick seqmagick-An imagemagick-like frontend to Biopython SeqIO], can convert from fasta to phylip, etc.
* [https://github.com/fhcrc/seqmagick seqmagick-An imagemagick-like frontend to Biopython SeqIO], can convert from fasta to phylip, etc.
 +
* [Reverse and/or reverse complement DNA sequences that handles degenerate bases ](http://arep.med.harvard.edu/labgc/adnan/projects/Utilities/revcomp.html
==Random sequence generators==
==Random sequence generators==
Line 43: Line 57:
* http://emboss.sourceforge.net/ - shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.
* http://emboss.sourceforge.net/ - shuffleseq from EMBOSS shuffles a set of sequences maintaining composition.
 +
 +
==Extract physico-chemical data from Protein or DNA sequences==
 +
 +
* [https://www.iitm.ac.in/bioinfo/SBFE/index.html Seq2Feature webserver] is a comprehensive web-based feature extraction tool which computes protein and DNA sequence driven features. It can calculate 252 protein- based and 42 DNA- based descriptors. Major protein sequence based descriptors include physico-chemical, energetic and conformational properties, mutation matrices and contact potentials. There is a corresponding article [https://academic.oup.com/bioinformatics/advance-article-abstract/doi/10.1093/bioinformatics/btz432/5499130?redirectedFrom=fulltext here].
==Orthology==
==Orthology==
* [http://eggnogdb.embl.de/#/app/home EggNOG] - A database of orthologous groups and functional annotation
* [http://eggnogdb.embl.de/#/app/home EggNOG] - A database of orthologous groups and functional annotation
 +
* [https://github.com/soedinglab/hh-suite/wiki#building-customized-databases HH-suite3 for sensitive protein sequence searching based on HMM-HMM alignment]
 +
 +
==Pattern Matching==
 +
 +
* [https://github.com/fomightez/patmatch-binder patmatch-binder- Launchable Jupyter environment for running command line-based PatMatch via Binder]. That page also links to other sequence pattern matching resources. The launched notebooks illustrate ways to easily work with the output in Python.
 +
 +
 +
* [https://github.com/soedinglab/hh-suite/wiki#building-customized-databases HH-suite3 for sensitive protein sequence searching based on HMM-HMM alignment]
 +
 +
* [http://eddylab.org/infernal/ Infernal: inference of RNA alignments]
 +
<blockquote> Infernal builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments.</blockquote>
 +
* [http://hmmer.org/publications.html HMMER: biosequence analysis using profile hidden Markov models]
==Some sequence analysis but mostly OTHER==
==Some sequence analysis but mostly OTHER==
Line 60: Line 90:
-
==Nucleic acid system building==
+
==Nucleic acid system building and DNA structure design==
-
* [http://www.nupack.org/ NUPACK] - "NUPACK is a growing software suite for the analysis and design of nucleic acid systems."
+
* [http://www.nupack.org/ NUPACK] - "NUPACK is a growing software suite for the analysis and design of nucleic acid structures, devices, and systems." Seems to be able to do melting temperature and free energy calculations as well, etc..
-
 
+
== Fungal Genome Resources ==
== Fungal Genome Resources ==
 +
 +
[http://1002genomes.u-strasbg.fr/news/news.html 1011 Saccharomyces cerevisiae genomes ], associated with [https://www.ncbi.nlm.nih.gov/pubmed/29643504 Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Peter J, De Chiara M, Friedrich A, Yue JX, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, Cruaud C, Labadie K, Aury JM, Istace B, Lebrigand K, Barbry P, Engelen S, Lemainque A, Wincker P, Liti G, Schacherer J. Nature. 2018 Apr;556(7701):339-344. doi: 10.1038/s41586-018-0030-5. Epub 2018 Apr 11. PMID: 29643504].
 +
 +
 +
[http://www.y1000plus.org 332 budding yeasts ] associated with [https://www.ncbi.nlm.nih.gov/pubmed/30415838 Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum. Shen XX, Opulente DA, Kominek J, Zhou X, Steenwyk JL, Buh KV, Haase MAB, Wisecaver JH, Wang M, Doering DT, Boudouris JT, Schneider RM, Langdon QK, Ohkuma M, Endoh R, Takashima M, Manabe RI, Čadež N, Libkind D, Rosa CA, DeVirgilio J, Hulfachor AB, Groenewald M, Kurtzman CP, Hittinger CT, Rokas A. Cell. 2018 Nov 29;175(6):1533-1545.e20. doi: 10.1016/j.cell.2018.10.023. Epub 2018 Nov 8. PMID: 30415838]. ([https://figshare.com/articles/Tempo_and_mode_of_genome_evolution_in_the_budding_yeast_subphylum/5854692 Figshare corresponding to the paper])
http://fungalgenomes.org/
http://fungalgenomes.org/
Line 95: Line 129:
== RNA Structure Analysis==
== RNA Structure Analysis==
-
* [http://eddylab.org/infernal/ Infernal] - A downloadable program fors equence analysis using profiles of RNA sequence based on [http://rfam.xfam.org/ Rfam]-associated covariance models and secondary structure consensus. The program can generate covariance models from RNA alignments as well. Binaries are avialble for Mac, Windows, and Linux. ( [http://www.ncbi.nlm.nih.gov/pubmed/24008419?dopt=Abstract E. P. Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches , Bioinformatics 29:2933-2935 (2013). PMID: 24008419])
+
* [http://eddylab.org/infernal/ Infernal] - A downloadable program fors equence analysis using profiles of RNA sequence based on [http://rfam.xfam.org/ Rfam]-associated covariance models and secondary structure consensus. The program can generate covariance models from RNA alignments as well. Binaries are available for Mac, Windows, and Linux. ( [http://www.ncbi.nlm.nih.gov/pubmed/24008419?dopt=Abstract E. P. Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches , Bioinformatics 29:2933-2935 (2013). PMID: 24008419])
 +
 
 +
* [https://github.com/mmagnus/rna-tools/blob/master/index-of-tools.md rna-tools] - (previously known as ' rna-pdb-tools'): a toolbox to analyze sequences, structures and simulations of RNA. (Takes some navigating around to find what you want because a lot is there.)
 +
 
 +
 
 +
== Analyze DNA curvature==
 +
 
 +
* [https://github.com/fomightez/bendit-binder bendit-binder] - use the [http://pongor.itk.ppke.hu/dna/bend_it.html#/bendit_intro Bend.it software] to predict DNA curvature from DNA sequences with the power of the Jupyter ecosystem served via MyBinder.org.
== Sequence Logo Generation ==
== Sequence Logo Generation ==
Line 111: Line 152:
* [https://github.com/fhcrc/seqmagick seqmagick-An imagemagick-like frontend to Biopython SeqIO]. For example, it can convert from fasta to phylip, remove gaps from a fasta-formatted sequence, and describe all FASTA files in the current directory. Requires Biopython.
* [https://github.com/fhcrc/seqmagick seqmagick-An imagemagick-like frontend to Biopython SeqIO]. For example, it can convert from fasta to phylip, remove gaps from a fasta-formatted sequence, and describe all FASTA files in the current directory. Requires Biopython.
-
* see also on this page 'Binder'-related items as I usually have provided a way to shuttle other command-line bases software output to Python
+
* see also earlier on this page 'Binder'/notebook-related items as I usually have worked out Python code to shuttle other command-line based software output to Python and notebook-related items [https://github.com/fomightez/sequencework here] as I sometimes demonstrate script usage in launchable notebooks
==My own sequence work-related code==
==My own sequence work-related code==
Line 117: Line 158:
* [https://github.com/fomightez/UGENE_help Working with UGENE software analysis software]
* [https://github.com/fomightez/UGENE_help Working with UGENE software analysis software]
* [https://github.com/fomightez/yeastmine Working with Yeastmine]
* [https://github.com/fomightez/yeastmine Working with Yeastmine]
 +
* see also earlier on this page 'Binder'/notebook-related items as I usually have worked out Python code to shuttle other command-line based software output to Python and notebook-related items [https://github.com/fomightez/sequencework here] as I sometimes demonstrate script usage in launchable notebooks
 +
* see also [https://github.com/fomightez/ My Github]

Current revision

Contents

Have not Categorized Yet

Aligning


BLAST+


Circos

  • Circos on Jupyter - Circos in your browser-based Jupyter enviroment served from MyBinder.org. Circos so it is actively available in a browser with one click to launch Jupyter environment for Circos via Binder. That page also links to the main Circos resources there. The launched notebooks illustrate ways to easily work with the output in Python.


Converters

Random sequence generators

Sequence shufflers


Extract physico-chemical data from Protein or DNA sequences

  • Seq2Feature webserver is a comprehensive web-based feature extraction tool which computes protein and DNA sequence driven features. It can calculate 252 protein- based and 42 DNA- based descriptors. Major protein sequence based descriptors include physico-chemical, energetic and conformational properties, mutation matrices and contact potentials. There is a corresponding article here.

Orthology

Pattern Matching


Infernal builds consensus RNA secondary structure profiles called covariance models (CMs), and uses them to search nucleic acid sequence databases for homologous RNAs, or to create new sequence- and structure-based multiple sequence alignments.

Some sequence analysis but mostly OTHER

  • BioCyc Database Collection - "BioCyc is a collection of 3530 Pathway/Genome Databases (PGDBs), with tools for understanding their data. Cellular Overview image generated by Pathway Tools. Explore Metabolic Maps for Thousands of Organisms. RouteSearch: Search for Paths through the Metabolic Network. Cross-Organism Search form generated by Pathway Tools. New: Search All of BioCyc for Genes, Proteins, Pathways. Search all of BioCyc or designated taxonomic groups for named genes, proteins, metabolites, pathways. Multiple Sequence Alignment results generated by Pathway Tools using MUSCLE. PatMatch query and results by Pathway Tools. SmartTable display generated by Pathway Tools. Metabolomics Data Analysis. Cellular Overview Omics Viewer image generated by Pathway Tools. Gene Expression Data Analysis. Multi-Genome Browser. Comparative Genome Analysis."


Good E. coli database

  • - EcoProDB E. coli protein database (EcoProDB) integrates protein information identified on 2-D gels along with other resources to provide the comparative platform for the expression levels of many heterogeneous proteins under different genetic and environmental conditions using the interactive interface and search mechanism.


NGS

  • HOMER - "Software for motif discovery and next-gen sequencing analysis". Nice in that it actually explains some of the details and advantages of the browsers and file types.


Nucleic acid system building and DNA structure design

  • NUPACK - "NUPACK is a growing software suite for the analysis and design of nucleic acid structures, devices, and systems." Seems to be able to do melting temperature and free energy calculations as well, etc..

Fungal Genome Resources

1011 Saccharomyces cerevisiae genomes , associated with Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Peter J, De Chiara M, Friedrich A, Yue JX, Pflieger D, Bergström A, Sigwalt A, Barre B, Freel K, Llored A, Cruaud C, Labadie K, Aury JM, Istace B, Lebrigand K, Barbry P, Engelen S, Lemainque A, Wincker P, Liti G, Schacherer J. Nature. 2018 Apr;556(7701):339-344. doi: 10.1038/s41586-018-0030-5. Epub 2018 Apr 11. PMID: 29643504.


332 budding yeasts associated with Tempo and Mode of Genome Evolution in the Budding Yeast Subphylum. Shen XX, Opulente DA, Kominek J, Zhou X, Steenwyk JL, Buh KV, Haase MAB, Wisecaver JH, Wang M, Doering DT, Boudouris JT, Schneider RM, Langdon QK, Ohkuma M, Endoh R, Takashima M, Manabe RI, Čadež N, Libkind D, Rosa CA, DeVirgilio J, Hulfachor AB, Groenewald M, Kurtzman CP, Hittinger CT, Rokas A. Cell. 2018 Nov 29;175(6):1533-1545.e20. doi: 10.1016/j.cell.2018.10.023. Epub 2018 Nov 8. PMID: 30415838. (Figshare corresponding to the paper)

http://fungalgenomes.org/

http://1000.fungalgenomes.org/home/

http://fungidb.org/fungidb/ (about it –> http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3245123/)

http://genome.jgi.doe.gov/programs/fungi/1000fungalgenomes.jsf <— nice graphic of situation related to 1000 fungal genomes project

http://genome.jgi-psf.org/programs/fungi/index.jsf

http://www.broadinstitute.org/scientific-community/science/projects/fungal-genome-initiative/fungal-genomics

http://fungi.ensembl.org/index.html

http://en.wikipedia.org/wiki/List_of_sequenced_fungi_genomes <– how current is it???

For genomic arrangement (synteny) comparisons/Fungal Genomics Resources

Synteny Viewer listed under every SGD gene on Sequence tab, near bottom of page

http://www.genomicus.biologie.ens.fr/genomicus-fungi-19.01/cgi-bin/search.pl

Yeast Gene Order Browser (YGOB)


RNA Structure Analysis

  • rna-tools - (previously known as ' rna-pdb-tools'): a toolbox to analyze sequences, structures and simulations of RNA. (Takes some navigating around to find what you want because a lot is there.)


Analyze DNA curvature

  • bendit-binder - use the Bend.it software to predict DNA curvature from DNA sequences with the power of the Jupyter ecosystem served via MyBinder.org.

Sequence Logo Generation

Installable software for fine-tuning sequence alignments

Windows equivalent is here but I have NOT tried it.

Python-based utilities

  • seqmagick-An imagemagick-like frontend to Biopython SeqIO. For example, it can convert from fasta to phylip, remove gaps from a fasta-formatted sequence, and describe all FASTA files in the current directory. Requires Biopython.
  • see also earlier on this page 'Binder'/notebook-related items as I usually have worked out Python code to shuttle other command-line based software output to Python and notebook-related items here as I sometimes demonstrate script usage in launchable notebooks

My own sequence work-related code

Proteopedia Page Contributors and Editors (what is this?)

Wayne Decatur

Personal tools