Institute for Mathematical Sciences                                        Programs & Activities



Protein diversity as a result of transcript isoform variation in expressed genes

Winston Hide, Vladimir N. Babenko, Peter A. van Heusden, and Janet F. Kelso

South African National Bioinformatics Institute, University of the Western Cape, private bag x17, Bellville, South Africa, 7535. winhide(AT)


Completion of the human genome sequence has provided evidence for a gene count with a lower bound of 27 500 – 40 000. Significant protein complexity encoded from this gene set must derive in part from multiple transcript isoforms. Recent studies utilizing ESTs and reference mRNA data have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30-40 % of human genes. It is likely that this is an underestimate as EST sampling has been used to derive the figures. Transcript form surveys have yet to integrate the genomic context, expression, and contribution to protein diversity of isoform variation. Exhaustive manual confirmation of genome sequence annotation, coupled with comparison to available expressed sequence data has been used here to accurately associate isoforms showing exon skipping with genomic sequence to reveal potential protein coding alteration. In addition, relative expression levels of transcripts have been estimated from EST representation in the public databases. Our rigorous method has been applied to 545 described genes in the first intensive study of exon skipping based on chromosome 22 genome sequence and matched human transcripts. The study has led to the discovery of 62 exon skipping events in 52 genes, with 57 exon skips altering the protein coding region. A single gene, (FBXO7) expresses an exon repetition. EST sampling analysis indicates that 58.8% of highly represented multi-exon genes are likely to express exon-skipped isoforms in ratios that vary from 1:1 to 1:>100. Comparisons with mouse show a similar overall level of skipping, although not at the same exon boundaries/genes. Analysis of cancer genes show that aberrant forms of skipping may segregate with cancer expression libraries.

 « Back


Inferring new gene and protein functions

Christian Schoenbach, RIKEN Genomic Sciences Center


Gene and protein functions comprise multiple transcriptional regulatory functions or protein-protein interactions. To investigate the multiple functional aspects we need to improve the use of the rich a priori knowledge in biological databases and literature. First, I discuss results and problems of a semi-automated discovery pipeline for protein motifs that has been used to identify new motifs and infer functions from more than 21,000 conceptually translated mouse cDNA sequences. In the second part I focus on the development of a discovery pipeline for protein interactions in context of disease associations that includes a text information extraction module for PubMed abstracts.

 « Back


Evidence of adaptive evolution provides the Achilles-heel of pathogen genomes: candidates for drug and vaccine targets

Junaid Gamieldien1, Betty Guo2, John J. Mekalanos2 and Winston A. Hide1

1 South African National Bioinformatics Institute, University of The Western Cape, S. Africa
2 Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, USA


Previous studies have shown that certain virulence factors of pathogenic bacteria e.g. adhesins and outer membrane proteins, appear to be under positive selection. . To determine whether new virulence-associated genes may be identified using screening for adaptive selection, we have performed an intraspecies search between orthologous genes that contain regions for which the rate of nonsynonymous (KA) substitution is greater than the rate of synonymous substitution (KS). We have analyzed pairs of strains of Helicobacter pylori and Neisseria meningitides. A total of 85 genes undergoing pathoadaptive evolution were identified, of which a large proportion (41/85) code for known or potential virulence genes. Interestingly, it appears that specific cellular processes in N. meningitidis are under strong selection, as we have detected multiple genes involved in iron acquisition and DNA-repair that have acquired adaptive mutations. Furthermore, of 21 H. pylori knockout mutants 5 had decreased colonization efficiency and 8 were fatal and thus classified as being ‘putative essential’. Due to the demonstrated ability of our system to identify known and potential virulence factors, and the fact that 61% of H. pylori genes tested in gene-knockout experiments were shown to play important roles in the organism’s biology, we conclude that it may be an important tool for novel drug and vaccine targets.

 « Back


The assembly of the human and mouse genomes

Gene Myers, Informatics Research, Celera Genomics


"Assembling" a very large genome from shotgun sequencing data was considered impossible at the time Jim Weber and the speaker proposed it for the Human Genome in 1996. In mid-1998, Celera was formed. By late 1999, the informatics research team at Celera had assembled the 130Mbp Drosophila genome after producing a whole genome shotgun data set with enough reads to cover genome 13 times over, a 13X data set. The team also produced an assembly of the Human genome and of the Mouse genome in the last three years. Their results from these projects in Celera prove unequivocally that whole genome shotgun sequencing is effective at delivering highly reliable reconstructions.

 « Back


Pascal Matrices

Gilbert Strang, Massachusetts Institute of Technology, USA


This is joint work with Alan Edelman at MIT and a little bit with Pascal. They had all the ideas.

Put the famous Pascal triangle into a matrix. It could go into a lower triangular L or its transpose L' or a symmetric matrix S:

       [ 1 0 0 0 ]
L =  [ 1 1 0 0 ]
       [ 1 2 1 0 ]
       [ 1 3 3 1 ]

       [ 1 1 1 1 ]
L' = [ 0 1 2 3 ]
       [ 0 0 1 3 ]
       [ 0 0 0 1 ]

       [ 1 1  1   1]
S =  [ 1 2  3   4]
       [ 1 3  6  10]
       [ 1 4 10 20]

These binomial numbers come from a recursion, or from the formula for i choose j, or functionally from taking powers of (1 + x).

The amazing thing is that L times L' equals S. (OK for 4 by 4) It follows that S has determinant 1. The matrices have other unexpected properties too, that give beautiful examples in teaching linear algebra. The proof of L L' = S comes 3 ways, I don't know which you will prefer:

1. By induction using the recursion formula for the matrix entries.

2. By an identity for the coefficients i+j choose j in S.

3. By applying both sides to the column vector [ 1 x x^2 x^3 ... ]'.

The third way also gives a proof that S^3 = -I but we doubt that result.

The rows of the "hypercube matrix" L^2 count corners and edges and faces and ... in n dimensional cubes.


 « Back


Analysis, Models and Simulations
Pierre-Louis Lions, École Polytechnique, France


In this talk, we shall first present several examples of numerical simulations of complex industrial systems. All these simulations rely upon some mathematical models involving Partial Differential Equations and we shall briefly explain the nature, the history and the role of such equations. Then, some examples showing the importance of the mathematical analysis (i.e. "understanding") of those models will be presented. And we shall conclude indicating a few trends and perspectives.


 « Back


Best viewed with IE 7 and above