Protein Modeling 101

An educational resource about protein structure modeling

Please click on the graphical flowchart to select chapters of interest:

img Structure and Template Libraries Structure Modeling Protein Sequence Databases Homology / Comparative Modelling Tools De novo structure prediction tools Hybrid techniques Template selection / fold recognition Model validation and quality estimation Hybrid methods Structure visualization Molecular Interactions Molecular motions Data preparation and analysis Confidence Estimation Applications

Experimental Data

Structure Template libraries

Protein structure homology modeling relies on the evolutionary relationship between the target (protein of interest) and template proteins. For this purpose a library of experimental protein structures is searched to identify suitable templates.

Typically structure template libraries are derived from the Protein Data Bank.

Protein resources on experimental structure information:

  • The world wide Protein Data Bank: The single archive of experimental marcomolecular structural data. [RCSB PDB] (USA); [PDBe] (Europe); [PDBj] (Japan)
  • CATH: A manually curated hierarchical domain classification of protein structures in the Protein Data Bank.

Amino Acid Sequences

  • UniProt: Protein Knowledgebase. A comprehensive, high-quality and freely accessible database of protein sequence and functional information.
  • RefSeq: NCBI Reference Sequence. A collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by NCBI.

Experimental Data Acquisition

A variety of data from diverse biochemical and biophysical experiments are available to characterize biomolecules (e.g., x-ray crystallography, NMR spectroscopy, electron microscopy, immuno-electron microscopy, footprinting, chemical cross-linking, FRET spectroscopy, small angle X-ray scattering, etc.)

Experimental approaches can be combined to computational methods to characterize biomolecules [hybrid methods].

  • SBKB: Structural Biology Knowledgebase. A portal to protein structures, sequences, functions and methods.

Structure Modeling

Homology Modeling

Homology modeling (or comparative modeling) aims to build three-dimensional protein structure models using experimentally determined structures of related family members as templates. Please find here a (non exhaustive) list of homolgy modeling tools and services:

  • HHpred: Server for homology detection and structure prediction by HMM-HMM comparison.
  • I-Tasser: I-TASSER is a server for protein structure and function predictions. 3D models are built based on multiple-threading alignments by LOMETS and iterative TASSER assembly simulations.
  • M4T: Comparative Modelling using a combination of multiple templates and iterative optimization of alternative alignments.
  • Modeller: Software for homology or comparative modeling of protein three-dimensional structures. MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints.
  • ModWeb: A web server for automated comparative modeling that relies on PSI-BLAST, IMPALA and MODELLER.
  • PHYRE2: A fold recognition server for predicting the structure and/or function of your protein sequence.
  • SWISS-MODEL: Fully automated protein structure homology-modeling server accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).

Template Search/Fold Recognition

Potential structural templates are identified using a search for homologous proteins in a library of experimentally determined protein structures. Please find here a (non exhaustive) list of templates detection tools and services:

  • BLAST/PSI-BLAST: Local alignment serch tools.
  • HHpred: Server for homology detection and structure prediction by HMM-HMM comparison.

De novo prediction

In cases where no suitable template structure can be identified, de novo (a.k.a. ab initio) structure prediction methods can be used to generate three-dimensional protein models without relying on a homologus template structure:

  • Robetta: Full-chain protein structure prediction server based on the Rosetta method.
  • Rosetta: De novo protein structure prediction software.

Hybrid techniques

The goal of hybrid techniques is to contribute to a comprehensive structural characterization of biomolecules ranging in size and complexity from small peptides to large macromolecular assemblies. Detailed structural characterization of assemblies is generally impossible by any single existing experimental or computational method. This barrier can be overcome by hybrid approaches that integrate data from diverse biochemical and biophysical experiments:

  • CS-ROSETTA: System for chemical shifts based protein structure prediction using ROSETTA.
  • IMP: software for a comprehensive structural characterization of biomolecules.

Confidence Estimation

Validation and Quality estimation

Model quality assessment tools are used to estimate the reliability of predicted protein structure models. The accuracy of individual models may vary significantly from the expected average quality due for instance to suboptimal target-template alignments, low template quality, structural flexibility or inaccuracies introduced by the modeling program. Individual assessment of each model is therefore essential. Please find here a (non exhaustive) list of tools and services:

  • ModEval: Model Evaluation Server, an evaluation tool for protein structure models.
  • ModFOLD: Model Quality Assessment Server.
  • ProQ2: A neural network based predictor based on a number of structural features, which predicts the quality of different parts of protein model.
  • QMEAN: Server for model quality estimation.
  • QMEANBrane: Quality estimation tool for membrane structures
  • SAVES: The Structure Analysis and Verification Server, runs up to six different programs for structure validation.

Applications

Today, template based protein models are used routinely in a broad spectrum of biological applications [reference]:

Structure Visualization & Analysis

Please find here a (non exhaustive) list of tools to allow visualization and analysis of macromolecular structures in PDB format:

  • DeepView/Swiss-PdbViewer: Software for protein structure analysis and visualization.
  • Jmol: an open-source Java viewer for chemical structures in 3D.
  • MolScript: A program for displaying molecular 3D structures.
  • OpenStructure: Open-Source Computational Structural Biology Framework.
  • POV-Ray: Open Source Raytracing software.
  • PyMol: A Python based open-source viewer for visualization of macromolecular structures.
  • UCSF Chimera: An extensible program for interactive visualization and analysis of molecular structures.

Molecular Interactions

Homology models are used in structure-based ligand discovery, facilitating the investigation of ligand-protein interactions in an effort to find ligands and improve their potency. One technique, virtual screening, computationally screens large libraries of organic molecules for those that complement the structure of a protein-binding site. Homology models accelerate the virtual screening process and can help make helpful suggestions before crystal structures are available or experimental high-throughput screening begins.

Please find here a (non exhaustive) list of tools and services for virtual high throughput screening (docking):

  • AutoDock: A suite of automated docking tools.
  • DOCK: A suite of programs for docking.
  • GOLD: A program for calculating the docking modes of small molecules in protein binding sites and is provided as part of the GOLD Suite, a package of programs for structure visualisation and manipulation.
  • Glide: Software for high throughput virtual screening.
  • RosettaLigand: Software to predict how a protein and a small molecule interact, using Monte Carlo minimization.
  • SwissDock: a web service to predict molecular interactions that may occur between a target protein and a small molecule.

Molecular Motions

Please find here a list of tools and databases to visualize molecular movements: