Introduction to Molecular Modeling and Computer Simulation
S. Ravichandran, Ph.D.
Date: March 14 (Wednesday), 2007 [5:30-7:30 PM]
Class Location: ABCC (Bldg 430), NCI-Frederick, MD 21702, USA
Goal: The course introduces basic concepts of molecular modeling and simulation.
We will cover a variety of topics
including visualization, molecular mechanics,
several different simulation techniques (Molecular Mechanics, Molecular Dynamics etc.),
and the analysis of the model structure. At the end, attendees
will work through a
short tutorial on the topics discussed in
the class.
Click
here
to download the recent class slides (pdf file) and hands-on session
notes
(pdf file). The data files used in the hands-on exercise can be downloaded
from
here.
Software: DS Viewer Pro 6.0 (Accelrys Inc.) will be used to
do the hands-on exercises.
Click
here
to go back to my class web-page.
Molecular Modeling Hands-on Exercises using
DS ViewerPro 6.0
Click
here
(PDF) to view the recent class notes and hands-on exercises
using DS ViewerPro 6.0
Molecular Modeling Hands-on Exercises using Sybyl
Use Sybyl (6.9.2) for the following two exercises.
- Exercise 1: Building Aspirin
(2-(acetyloxy)benzoic acid) molecule (SMILES script for Aspirin is CC(=O)Oc1ccccc1C(=O)O).
We will load the pre-built aspirin mol2 (sybyl file type) into sybyl.
The pre-built aspirin molecule has been intentionally made to contain
some incorrect atom types. In this
example we will correct the atom-types, add partial charges,
and use molecular mechanics method to energy minimize the molecule.
Finally, we will also display the
total energy of the minimized aspirin molecule.
This exercise has been setup to understand the basic assumptions
of molecular modeling using a simple system. It will also anser
questions such as:
What are atom-types?
Why do we have to worry about them?
How to calculate partial-charges?
What are the different methods of calculating them?
What is a force-field? Which one to choose?
How to carry-out energy minimization?
How to calculate the total energy of aspirin and list its
different energy components (electrostatic, van der Waals etc.)?
- File >> Read >> Choose "File Type:molecule"
>> choose "basaspirin.mol2" >> Hit "OK"
Note: Use DepthCue option to make
your molecule
look bright. DepthCue option
is accessible from the column of
menu options located on the left.
- View >> Label Atom Name... >> Click "All" >> Hit "OK"
Note: The lable size can be changed
by clicking on the box labelled "Font"
located on the left-most
of the Sybyl main window.
- View >> Label Charges...
- View >> Label Atom Type... >> Click "All" >> Hit "OK"
- As indicated above some of the atom types of Aspirin are
not correct. For example, carbon atoms belonging to aromtic
rings should have atom types as C.ar.
To fix/correct the atom-types do the following:
- Build/Edit >> Modify >> Atom...>> ONLY_TYPE >> hit "OK" >>
Click "All" >> Hit "OK". This will select all the atoms and
open the "Option" window.
- As the green marker moves through the atoms of aspirin, Choose
proper atomtypes (see figure for correct atom-types) and hit "OK".
Aspirin with correct sybyl atom types
are shown in the displayed figure
- Build >> Edit >> Add >> Hydrogens
- Compute >> Charges >> Gasteiger-Huckel >> Anser "No" to
the question "Do you want to change formal charges before
computing charges?"
- Computer >> Minimize... >> Click "Energy Setup: Modify" button
>> Charges "Use Current" >> Hit "OK" >> Again hit "OK". Click
here to see the minimized structure displayed with a VDW dot surface.
- Compute >> Energy >> Hit "Compute" to compute the energy.
Type "C" to see the additonal Output.
- Exercise 2: How to align two protein structures using homology?
For this exercise,
we will be aligning two forms of
asparaginases (1wsa,1hfw) using the biopolymer module of
Sybyl (6.9.2). Click on the
respective pdb entries to download the asparaginases.
- 1wsa: Wolinella
Succinogenes L-Asparaginase II
- 1hfwAC
: (Erwinia Chrysanthemi
L-Asparaginase A & C chains only).
- File >> Read >> 1wsa.pdb >> Hit "OK" >> Anser "No" not to
center the molecule.
- File >> Read >> 1hfwac.mol2. This molecule will be put in M2 area.
- View >> Color >> Atoms.. >> Choose M1 & Left-Click All >> OK >> Cyan
- View >> Color >> Atoms.. >> Choose M2 & Left-Click All >> OK >> Yellow
Click here to see the figure.
- Biopolymer >> Display >> C-alpha only >> Choose M1 & hit "OK"
- Biopolymer >> Display >> C-alpha only >> Choose M2 & hit "OK"
- Build/Edit >> Delete >> Atom... >> Left-click to select M1 >>
Click on "Sets..." >> select "WATER" >> hit "OK"
- Build/Edit >> Delete >> Atom... >> Left-click to select M2 >>
click on "Sets..." >> select "WATER" >> hit "OK"
- Biopolymer >> Align Structures Using Homology... >> To Specify Fixed
protein (highlight m1), and to specify Movable protein (highlight m2).
Choose "Calpha" as the type of atoms used for fitting, and click
"Align"
- After a brief break, main window shows the aligned proteins. Also
watch the command prompt window to see the distance between each residues
with the Weighted Root Mean Square Distance value
- Exercise 3: Protein Visualization using Sybyl.
For instructions, click
here.
Use Accelrys InsightII for the following exercise
- Exercise 1: Hydrophobicity calculation, surface creation and visualization using InsightII
Take Home Exercise: Case Study
Aim: In this case study, we will start with a fragment of a DNA sequence
which encodes a protein. We will translate the nucleic acid sequence and
identify the protein using the standard bioinformatics
techniques. Upon identifying the protein, we will extract the whole protein
sequence and use the analysis tools to find sequence neighbours.
We will also search for patterns, domains and identify any fingerprints in the model
sequence. Finally, we will predict the secondary structure and if the structure is unknown,
exploit the homology to build
the 3-D structure. We will further analyze the protein structure using
popular software such as InsightII and Sybyl.
Mystery Model (nucleotide sequence)
aaaaatgaac caataattac aagttcaaaa
Translation
- For the first part of this exercise,
we will use the composite database,
OWL.
Use the analysis tools interface to submit the above
nucleic acid query, see
Fig 1. The six translated
frames are shown
here in
Fig 2.
In the previous figure the stop
codons are denoted by the symbol (!). Ignoring the sequences with stop codons,
we are left with forward0 and reverse1 amino acid sequences. Which is the
correct sequence?
Identifying the correct fragment
-
To start with we will use the forward0 sequence,
KNEPIITSSK ,
and search in the nr (NCBI) database. Since we want to find the exact match,
we will choose the Protein Blast and use search for short
nearly exact matches option.
The results of the search are shown in this figure
Fig 3. This concludes that
KNEPIITSSK ,
is the correct protein sequence. To see the results of the blast search for
reverse1 (FELVIIGSF), click
Fig 4. Similar
results were obtained
when searched through the OWL composite database.
Identifying the family & Looking for patterns
- This concludes that the forward0
is the correct partial sequence. From NCBI (NR) search we found that
the partial sequence is actually the part of a protein similar to
Bacteriophage T4
Lysozyme. Blast is not always guaranteed to give clear-cut results.
So, to confirm the results, scan the sequence (pattern) using
PROSITE database
Fig. 5.
To see the results click
here (Fig.6).
Also from the PFAM database, we identify that this sequence belongs to
"Phage Lysozyme" (Accession number: PF00959). Click here for the
PFAM
reults (Fig. 7).
Swiss-Prot also categorises
the parent protein for our sequence to be a hypothetical protein meaning
that there is no experimental evidence that this protein is expressed in
vivo . For more information on the results of the
Swiss-Prot search, click
here (Fig.8).
The full
protein sequence in fasta format is shown below
>tr|Q8T1H7 Hypothetical 19.4 kDa protein - Dictyostelium discoideum (Slime mold).
MVSSIKDMLKYDEGEKLEMYKDTEGYYTIGIGHLITRIKERNAAILSLEEKIGHKVKMDS
KNEPIITSSKSEALFEKDLSVATKSIESNPTLSTIYKNLDNIRKMAIINMVFQMGVNNVL
TFKMSLKLIEEKKWAEAAKEMKNSTWNHQTPNRSNRVISVIETGTLNAYK
Click
here (seq.fasta)
to download the protein sequence.
Searching the BLOCKS database
The URL for
BLOCKS database
is http://blocks.fhcrc.org/blocks/blocks_search.html .
BLOCKS is an automatically generated database of ungapped
multiple sequence alignments that correspond to the most
conserved regions of proteins.
For results, click
here.
Searching the PRINTS database
PRINTS database defines a fingerprint for protein families.
It stresses the fact that a group of motif defines the protein
families. The Web-link for
PRINTS database is
http://www.bioinf.man.ac.uk/dbbrowser/PRINTS/
To see the search results, click
here. For the graphic display
of GRAPHScan, click
here.
Homology Modeling: Identifying the template(s)
Swiss-Model was used to homology model the protein sequence. The results of the Swiss-Model can be obtained by
clicking
here. The model protein
in PDB format can be obtained by clicking
here. To visualize the model
protein, click
here.
here
The Psi-Blast results show that the closely related
3D-strucure
is 1L24, for the final results click
here (Fig.11).
Analysis
- Compare the results from Pfam, PRINTS and BLOCKS.
Do the results agree with each other?
- Is the Swiss-Model OK? Did it predict the 3D structure
for all the input amino acid sequences?
- Predict the secondary structure using PredictProtein
server? Is the secondary structure evaluation agrees with the
Swiss-Model built 3D-structure?
- Evaluate the quality of the structure. How about using procheck
for the structure evaluation?
- Calculate some of the important properties like,
solvent accessible surface area (SAS), Delphi electrostatic properties,
location of hydrophobic and hydrophilic surface patches (if any).
Modeling Resources
Visualizer
- SPDBV Swiss-PDB Viewer
- Cn3D See in 3-D
- Rasmol Rasmol
Computational Chemistry/Biology Software
- Sybyl Tripos (Flexi-Dock, QSAR, Gen-Fold etc.) ($$$$$)
SYBYL Local information on using SYBYL on ABCC computers.
- InsightII Accelrys (GCG, InsightII, Homology,
Modeler etc.) ($$$$)
Insight II Local information on using InsightII on ABCC computers.
- AutoDock Automated Docking of Flexible
Ligands to Macromolecules (Scripps Research Institute)
- DOCK Small Molecule Receptor docking (UCSF)
- Amber Assisted Model Building and Energy
Refinement. Biomolecular MM and MD simulations.
- GROMACS GROMACS: The Fastest Molecular Dynamics
Program.
- VMD Visual Molecular Dynamics software
(NIH)
Literature Survey
- Medline NCBI-Medline
Databases
- GenBank NCBI-Gene Bank
- GenScan.
Gene Scanning and identification.
- Proteomics Tools Expassy Proteomics tools
(Protein Identification, DNA -> Protein, Similarity Searches,
Post-translational modification prediction, Primary/secondary/Tertiary
structure prediction and many more)
-
Protein Machine
Translation tools at European Bioinformatics Institute, UK
- EMBL European Molecular
Biology
- European Bioinformatics
Institute (EBI)
- MIPS EMBnet Special Node, MIPS
-
BSM Biomolecular and Structure Modelling Group, UCL, UK
-
Swiss-Prot
ExPassy: Access to databases Swiss-Prot, TrEMBL, Prosite, SWISS-2DPAGE, ENZYME,
SWISS-MODEL
Repository, CD40Lbase, SeqAnalRef from ExPASy home page
- TrEMBL TrEMBL Database contains all the
coding sequences (CDs) from the EMBL nucleotide sequences.
- PIR Protein Information Resource
- nrl3d NRL-3D
- RESID
RESID database is a comprehensive collection of annotations and structures
for post-translational modifications.
-
EMBNET Switzerland Blast, Box-Shade, DotPlot T-COFEE and many more
Composite Databases
- NR
All non-redundant GenBank CDS translations+RefSeq Proteins
+PDB+SwissProt+PIR+PRF
- OWL
non-redundant datbase of four publically available sources
Swiss-Prot, PIR(1-3), GenBank (translation) and NRL-3D.
-
UNIPROT Universal Protein Resource [Swiss-Prot, PIR and TrEMBL]
-
Swiss-Prot+TrEMBL Swiss-Prot Composite Database
Sequence Comparison
- BLAST NCBI Blast
- Dotlet Swiss-Institute for Experimental Cancer
Research, a founding member of Swiss Institute for Bioinformatics.
- Pfam MSA C2H2 zinc finger Domain multiple
sequence alignment.
Secondary Databases
- Prosite Database of
protein families and domains.
- PRINTS
Protein FingerPrint Database. Search initiated from BLOCKS database uses
PRINTS fingerprint database
- BLOCKS
Uses INTERPRO database from EMBL
- PFAM
Protein families database of alignments and Hidden Markov Models
- PRODOM
Comprehensive protein domain families obtained
from Swiss-Prot and TrEMBL.
Structural Databases
- PDB Protein Data Bank
-
mmCIF
Information on macromolecular Crystallographic Information File
format
- MMDB Molecular Modeling DataBase: A database
of 3D structures, as well as tools for their visualization and analysis.
- SCOP
Structural Classification of Proteins
- Prof. Thornton-PDBsum
A database of known 3D strucutres of proteins and nucleic acids
- PDBREPORT
Database reports structural problems in PDB entries using WHAT_CHECK software
- CATH
Protein Structure Classification
- FSSP
Fold classification based on
Structure-Structure alignment of Proteins
Protein-Protein/Protein-DNA Interactions
- Protein-Protein interaction server
- DNA-Protein interaction server
Secondary Structure Prediction
- Predict Protein Server The Predict
Protein Server
- JPRED Server Barton Group Secondary Structure Prediction Server
- PSIPERD Secondary Structure Prediction Server
- EVA EVA continously and automatically analyses protein structure
prediction servers in 'real time'
- SAM-T02 HMM-based Protein Structure Prediction
Identifying Transmembrane segments
- Protscale Expasy web-site for hydrophobicity prediction
- TMHMM Predicting transmembrane regions using Hidden Markov Models.
This site is maintained by Center for Biological Aanalysis (CBS), the Technical University
of Denmark.
- PHDhtm PHDhtm TRANSMEMBRANE HELICES PREDICTION
Homology Modeling
-
Swiss-Model An Automated Comparative Protein Modeling Server
-
Gene-Mine
Gene Analysis Engine and Viewer for Unix. Formerly called
Xlook.
- Modeller
Program for Comparative Protein Structure Modeling by
Satisfaction of Spatial Restraints
Structure Validation and Quality of protein models
-
Procheck Stereo-chemical quality of protein and residue by residue
analysis in figures
- PDBreport
Report of PDB files
- VADAR Volume, Area, Dihedral Angle Reporter
Report of PDB files
- DSSP Definition of secondary structure of proteins given a set of 3D coordinates
- Verify_3D Verify3D Structure Evaluation Server
Evolutionary Conservation in 3D protein structures
- Consurf Server Server for identification of
Functional Regions in Proteins
Structure Comparison
- SuperPose
- Deep-View or Swiss PDB Viewer Structure overlay can be carried out using Magic Fit module
- Other commercial tools include, Sybyl, InsightII etc.
Electrostatics and visualization
- Delphi Commercial Version of Delphi.
A module in InsightII.
- Grasp Graphical Representation and Analysis of
Structural Properties. A Molecular Visualization and analysis
program. It is particularly useful for the display and manipulation
of the surfaces of molecules and their electrostatic properties.
Force-Fields
- MM2,MM3,MM4 Force-Field
Drug Information
- Sigma_Aldrich
- United States Pharmacopeia
Biochemical Pathways
-
Biochemical Pathways aMAZE: Workbench for the representation
for the representation, management and analysis of information
of cellular processes, genetic biochemical pathways, signal
transductions.
-
Expasy Biochemical Pathways
- KEGG
Kyoto Encyclopedia of
Genes and Genomes
Other Useful Resources
- OMIM Information about disease causing genes.
Recommended Books/Web-links
- Molecular Modeling: Basic Principles and Applications, Second Edition, H.-D. Holtje, W. Sippl,
D. Rognan, G. Folkers (2003)
- Sequence Analysis in a Nutshell:
A guide to Common Tools and Databases
Scott Markel, Kristine Conner, Darryl Leon (2003)
- Molecular Modeling and Simulation, T. Schlick (2002)
- Molecular Modeling, A.R. Leach (2001)
- Boinformatics: A Practical Guide to the
Analysis of Genes and Proteins
Andreas D. Baxevanis, B. F. Ouellette (2001)
- Beginning Perl for Bioinformatics James D. Tisdall, Betsy Waliszewski (2001)
- Bioinformatics Basics: Applications in Biological Sciences and
Medicine, H.H. Rashidi and L.K. Buehler (2000)
- Bioinformatics Computer Skills, Cynthia Gibas and Per Jambeck,(2000)
- Computer Simulation of Liquids, M.P. Allen & D.J. Tildesley (1989)
- Fortran 90/95 by S.J. Chapman
On-line Tutorials
Contact Info:
Dr. S. Ravichandran
Advanced Biomedical Computing Center,
NCI-Frederick,
NIH
Bldg 430, Frederick, MD 21702
Tel: 301-846-1991
Email: sravi at ncifcrf dot gov
Web: http://ncisgi.ncifcrf.gov/~ravichas
Web-Page Updated on May 8 2008