Bioinformatics Primer

Skip to Navigation

Education Article

  • Published: Mar 1, 2014
  • Channels: Laboratory Informatics / Proteomics & Genomics / Chemometrics & Informatics / Proteomics
thumbnail image: Bioinformatics Primer

By Dr Amna Butt


The accumulation of a staggering quantity of biological data in recent years led to the co-evolution of a multifaceted discipline that entails everything from the processing and cataloguing of sequence data to the functional analysis of uncharacterised proteins. Bioinformatics is the amalgamation of mathematics, computer science and experimental biology, creating a science for the accomplishment of these diverse objectives.

As soon as a sequence of a novel gene is deposited into a databank computational tools are employed to determine function, occasionally referred to as computational genomics. The use of sequence alignment tools (Altschul  et al., 1990) is a powerful first step for characterisation, either by searching for conserved sequences, at the DNA or amino acid level, or to reveal homologous genes from other organisms that may be indicative of biochemical function. This procedure is reliable, rapid and cheap but is limited and cannot be applied in two instances, upon discovery of a novel gene that has no characterised homologues in any organism and also when proteins are only distantly related. Additionally, it is critical to bear in mind that sequence homology does not equate to functional similarity and some sequence comparisons may result in misleading information (Strauss and Falkow, 1997). 

Computational genomics can also be used to probe the entire DNA sequence of an organism and reveal the presence of a conserved sequence (either amino acid or DNA) that denotes a particular function of interest. Strauss and Falkow (1997) successfully used this approach to identify candidate genes important for virulance in the  H. influenzae genome, confirming the computational data empirically. Fetrow and Skolnick (1998) demonstrated an extension of sequence alignment tools to the use of threading algorithms that allow structural prediction of proteins. The algorithm used, examined the sequence and predicted a structure by aligning with a structural database, an active site was then identified in the protein structure. This concept was used to analyse the  E. coli  genome to screen genes for the thiol-disulphide oxidoreductase activity of the glutaredoxin/thioredoxin protein family. Using the algorithm, ten sequences corresponding to proteins known for this activity were identified and two novel ORFs were also identified. However the same precautionary measures taken with the alignment tools must be implemented when using this form of computational functional analysis.

Bioinformatics represents numerous disciplines besides computational genomics the accessibility to databases such as YPD and SWISS-2DPAGE (Sanchez et al., 1995) via the world-wide web is attributed to bioinformatics. More importantly, many bioinformatics projects lead to the design and construction of the tools such as TagIdent, PeptideSearch and SEQUEST (Wilkins et al, 1998; Nawrocki et al, 1998; Traini et al., 1998), used by researchers to identify proteins in large-scale proteomic research. In addition to the public resources, many laboratories use their own functional tools created to suit their purposes. Table 1 lists some of the world-wide web resources available to experimental biologists, this is by no means an exhaustive catalogue simply, a few of the tools currently available for use in functional genomics.


Table 1. Internet sites used for functional genomics. The sites may be used as resources for numerous applications in functional genomics.

Name

Resources

Web Link

EMBL

Genome sequence database

http://www.embl-heidelberg.de/

GenBank

Genome sequence database. Links to sequence alignment tools

http://www.ncbi.nlm.nih.gov/Genbank/

BioMedSearch.com BioMedSearch is an enhanced version of the NIH PubMed search that combines MedLine/PubMed data with data from other sources to make the most comprehensive biomedical literature search available. http://www.biomedsearch.com/

SRS

Access to various databanks in molecular biology

http://srs.ebi.ac.uk/

ExPaSy

Analysis of protein sequences and structures.

http://www.expasy.ch/

Saccharomyces
genome
database

Yeast genome database

http://www.yeastgenome.org/

MIPS

Protein sequence database

http://mips.gsf.de/

PIR

Protein sequence database

http://pir.georgetown.edu/

YPD

Yeast protein database

http://www.proteome.com/YPDhome.html

SWISS-2DPAGE

2-D gel database of various organisms

http://www.expasy.ch/ch2d/

Yeast 2-D Gel

Yeast 2-D gel databank

http://www.ibgc.u-bordeaux2.fr/YPM

2DWG meta-base

Links to numerous 2-D databases

http://www-lecb.ncifcrf.gov/2dwgDB/

ExPaSy Proteomics tools

Tools for protein identification using mass spectrometry

http://www.expasy.ch/tools/

ProFound

Protein chemistry and mass spectrometry resource (PROWL)

http://prowl.rockefeller.edu/

ProteinProspector

Peptide mass search tools from UCSF

http://donatello.ucsf.edu/

 

References

Altschul, Stephen F., Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman (1990). Basic local alignment search tool.  J. Mol. Biol.  215: 403-410.

Strauss, E.J., and Falkow, S (1997). Microbial pathogenesis: genomics and beyond.  Science  276: 707-712.

Fetrow J, Skolnick J. (1998). Method for prediction of protein function from sequence using the sequence-to-structure-to-function paradigm with application to glutaredoxin/thioredoxin and T1 ribonuclease.  J Mol Biol  281: 949-968.

Sanchez J-C, R.D. Appel, O. Golaz, C. Pasquali, F. Ravier, A. Bairoch and D.F. Hochstrasser (1995). Inside SWISS-2DPAGE database.  Electrophoresis  16:  1131-1151.

Wilkins M.R., E. Gasteiger, L. Tonella, K. Ou, M. Tyler, J.C. Sanchez, A.A. Gooley, B.J. Walsh, A. Bairoch, R.D. Appel, K.L. Williams and D.F. Hochstrasser (1998). Protein identification with N and C-terminal sequence tags in proteome projects.  Journal of Molecular Biology  278:  599-608.

Nawrocki A., M.R. Larsen, A.V. Podtelejnikov, O.N. Jensen, M. Mann, P. Roepstorff, A. Gorg, S.J. Fey and P.M. Larsen, (1998). Correlation of acidic and basic carrier ampholyte and immobilized pH gradient two-dimensional gel electrophoresis patterns based on mass spectrometric protein identification.  Electrophoresis  19:1024-1035.

Traini M., A.A. Gooley, K. Ou, M.R. Wilkins, L. Tonella, J.C. Sanchez, D.F. Hochstrasser and K.L. Williams, (1998). Towards an automated approach for protein identification in proteome projects.  Electrophoresis  19:1941-1949.

Social Links

Share This Links

Bookmark and Share

Microsites

Suppliers Selection
Societies Selection

Banner Ad

Click here to see
all job opportunities

Most Viewed

Copyright Information

Interested in spectroscopy? Visit our sister site spectroscopyNOW.com

Copyright © 2014 John Wiley & Sons, Inc. All Rights Reserved