Institute for Mathematical Sciences                                        Programs & Activities



Heterogeneous Multiscale Methods

Bjorn Engquist, Princeton University


The heterogeneous multiscale method is framework for a class of computational techniques for solving problems with a large variety of scales. We will discuss the theory of convergence as well as applications, for example, to composite materials and the coupling of molecular dynamics and fluid equations.

 « Back

Multiscale Modeling and Computation

Thomas Yizhao Hou, California Institute of Technology


Many problems of fundamental and practical importance contain multiple scale solutions. Composite materials, flow and transport in porous media, and turbulent flow are examples of this type. Direct numerical simulations of these multiscale problems are extremely difficult due to the range of length scales in the underlying physical problems. Here, we introduce a dynamic multiscale method for computing nonlinear partial differential equations with multiscale solutions. The main idea is to construct semi-analytic multiscale solutions local in space and time, and use them to construct the coarse grid approximation to the global multiscale solution. Such approach overcomes the common difficulty associated with the memory effect and the non-unqiueness in deriving the global averaged equations for incompressible flows with multiscale solutions. It provides an effective multiscale numerical method for computing incompressible Euler and Navier-Stokes equations with multiscale solutions. In a related effort, we introduce a new class of numerical methods to solve the stochastically forced Navier-Stokes equations. We will demonstrate that our numerical method can be used to compute accurately high order statistical quantities more efficiently than the traditional Monte-Carlo method.

 « Back


Approximate Tail Probabilities for Scan Statistics

David Siegmund, Stanford University


Different problems in medical imaging, astrophysics, genetic mapping and DNA sequence analysis involve observations of a random field Zt for t Î T, where T is (usually) a subset of k dimensional Euclidean space (usually k = 1, 2 or 3). Over most of the space T, Zt is random noise; but in a relatively small part of the space it may contain a signal that one wants to detect and locate as precisely as possible. A natural formulation leads to a class of irregular statistical problems (where standard likelihood theory cannot be expected to hold). I will describe scientific settings for some of these problems and discuss the problem of assessing the significance level of the likelihood ratio test for signal detection. This involves a tail approximation for the maximum of a random field of dimension at least k to account for the location, size and shape of the unknown signal. In some cases the field has smooth sample paths and geometric methods have proved very useful. When the sample paths are irregular, a change of measure argument has been found to be useful. In this talk I will concentrate on the case of smooth random fields and indicate how the change of measure can be adapted to replace the geometric methods by a more probabilistic argument.

 « Back


On Modelling Weather Indices and Pricing Weather Derivatives

Shuangzhe Liu, Australian National University


The weather has an enormous impact on business activities. The last five years has witnessed the rapid growth of the weather derivatives market to a remarkable level. The goal of this talk is to develop a new pricing approach for weather derivatives. We first review the background of weather derivatives, and then introduce our approach for analyzing and pricing weather derivatives.

 « Back


Economics and Cryptography on the Internet

Andrew M. Odlyzko, Digital Technology Center, University of Minnesota Minneapolis, USA


Mathematical cryptography has provided a variety of interesting and important tools for ecommerce. The impact of these tools was widely expected to be revolutionary. Yet so far, although some of these mathematical techniques do play a vital role, many of the early expectations have been disappointed. The reason for this, which will be explored in this lecture, is that many of these tools conflict with the economic incentives that govern how ecommerce is conducted.

 « Back


Pure Point Diffraction in Aperiodic Structures

Robert V. Moody, University of Alberta, Canada


One of the extraordinary features of both real physical quasicrystals and some of the famous aperiodic tilings, like the Penrose tilings, is that they are essentially pure point diffractive. Two problems arise from this. First, what kind of discrete structures can we construct for which we can prove pure point diffractivity, and second, the ages old problem, what can we determine about a discrete structure by looking only at its diffraction image?

In this talk we will discuss diffraction and the mathematics that has been generated over the past few years to discuss it. We give a number of nice examples of structures for which it arises, and finally show how an ansatz called the cut and project formalism seems to unify the picture.

The talk will be suitable for a general mathematical audience.

 « Back


The Assembly of the Human and Mouse Genomes

Gene Myers Informatics Research, Celera Genomics


"Assembling" a very large genome from shotgun sequencing data was considered impossible at the time Jim Weber and the speaker proposed it for the Human Genome in 1996. In mid-1998, Celera was formed. By late 1999, the informatics research team at Celera had assembled the 130Mbp Drosophila genome after producing a whole genome shotgun data set with enough reads to cover genome 13 times over, a 13X data set. The team also produced an assembly of the Human genome and of the Mouse genome in the last three years. Their results from these projects in Celera prove unequivocally that whole genome shotgun sequencing is effective at delivering highly reliable reconstructions.

 « Back


Back to The Future: Ancestral Inference in Molecular Biology

Simon Tavaré, University of Southern California


Technological advances in molecular biology have made it possible to survey genome-wide DNA sequence variation in natural populations. Many different types of data are now routinely generated and allow us to study molecular processes on a wide variety of time scales. For example we can compare distantly related species, or individuals from the same species, or cells within a given individual. The talk will focus on ancestral inference: what can we infer about molecular and historical processes using variation data? Several examples will be given to illustrate the ideas, including reconciliation of fossil and molecular estimates of divergence times, the estimation of the age of a mutation, and inference about the age of a colon tumor.

 « Back


Similarities of Human and Chimpanzee at the Genomic Level

Wen-Hsiung Li, University of Chicago


Thanks to the completion of a draft of the human genome and the sequencing of a substantial part of the chimpanzee genome, the following intriguing and long-standing question is ready to be answered: Are human and chimpanzee very similar to each other in terms of DNA sequences? Since human and chimpanzees are very different in terms of morphology and cognitive ability, it will be very surprising if they are genetically very similar. After presenting some basic knowledge of genetics for the general audience, the speaker will discuss the genomic similarities between human and chimpanzee.

 « Back


Unraveling Genes: The Role of Mathematics and Statistics

Warren Ewens, University of Pennsylvania


Mathematics and statistics have been used since the early days of genetics to resolve problems that could not be handled by any non-quantitative approach. These problems are mainly concerned with evolutionary questions and the elucidation of the role of genes in genetic diseases.

With the drafting of the human genome in the year 2000, mathematics and statistics have an even greater role to play in modeling complex biological processes involving genes and proteins and in analyzing the huge amount of data and information amassed by the molecular biologists. Through several examples, it will be shown how statistical problems arise from genetic questions and how solutions to these statistical problems help answer those genetic questions.

 « Back


Highlights of an Emerging Field: Computational Biology

Franco Preparata, Brown University, USA


Momentous developments in the second half of the twentieth century are the advent of the digital computing culture and of molecular biology. Great advances in the computing sciences and the acquisition of enormous amounts of biological data, with significant scientific and societal implications, have created the ideal conditions for interdisciplinary activity in the form of an emerging field at the interface of biology and computer science.

 « Back


Protein Diversity as a Result of Transcript Isoform Variation in Expressed Genes

Winston Hide, Vladimir N. Babenko, Peter A. van Heusden, and Janet F. Kelso

South African National Bioinformatics Institute, University of the Western Cape, private bag x17, Bellville, South Africa, 7535. winhide(AT)


Completion of the human genome sequence has provided evidence for a gene count with a lower bound of 27 500 – 40 000. Significant protein complexity encoded from this gene set must derive in part from multiple transcript isoforms. Recent studies utilizing ESTs and reference mRNA data have revealed that alternate transcription, including alternative splicing, polyadenylation and transcription start sites, occurs within at least 30-40 % of human genes. It is likely that this is an underestimate as EST sampling has been used to derive the figures. Transcript form surveys have yet to integrate the genomic context, expression, and contribution to protein diversity of isoform variation. Exhaustive manual confirmation of genome sequence annotation, coupled with comparison to available expressed sequence data has been used here to accurately associate isoforms showing exon skipping with genomic sequence to reveal potential protein coding alteration. In addition, relative expression levels of transcripts have been estimated from EST representation in the public databases. Our rigorous method has been applied to 545 described genes in the first intensive study of exon skipping based on chromosome 22 genome sequence and matched human transcripts. The study has led to the discovery of 62 exon skipping events in 52 genes, with 57 exon skips altering the protein coding region. A single gene, (FBXO7) expresses an exon repetition. EST sampling analysis indicates that 58.8% of highly represented multi-exon genes are likely to express exon-skipped isoforms in ratios that vary from 1:1 to 1:>100. Comparisons with mouse show a similar overall level of skipping, although not at the same exon boundaries/genes. Analysis of cancer genes show that aberrant forms of skipping may segregate with cancer expression libraries.

 « Back


Inferring New Gene and Protein Functions

Christian Schoenbach, RIKEN Genomic Sciences Center


Gene and protein functions comprise multiple transcriptional regulatory functions or protein-protein interactions. To investigate the multiple functional aspects we need to improve the use of the rich a priori knowledge in biological databases and literature. First, I discuss results and problems of a semi-automated discovery pipeline for protein motifs that has been used to identify new motifs and infer functions from more than 21,000 conceptually translated mouse cDNA sequences. In the second part I focus on the development of a discovery pipeline for protein interactions in context of disease associations that includes a text information extraction module for PubMed abstracts.

 « Back


Evidence of Adaptive Evolution Provides the Achilles-Heel of Pathogen Genomes: Candidates for Drug and Vaccine Targets

Junaid Gamieldien1, Betty Guo2, John J. Mekalanos2 and Winston A. Hide1

1 South African National Bioinformatics Institute, University of The Western Cape, S. Africa
2 Department of Microbiology and Molecular Genetics, Harvard Medical School, Boston, USA


Previous studies have shown that certain virulence factors of pathogenic bacteria e.g. adhesins and outer membrane proteins, appear to be under positive selection. . To determine whether new virulence-associated genes may be identified using screening for adaptive selection, we have performed an intraspecies search between orthologous genes that contain regions for which the rate of nonsynonymous (KA) substitution is greater than the rate of synonymous substitution (KS). We have analyzed pairs of strains of Helicobacter pylori and Neisseria meningitides. A total of 85 genes undergoing pathoadaptive evolution were identified, of which a large proportion (41/85) code for known or potential virulence genes. Interestingly, it appears that specific cellular processes in N. meningitidis are under strong selection, as we have detected multiple genes involved in iron acquisition and DNA-repair that have acquired adaptive mutations. Furthermore, of 21 H. pylori knockout mutants 5 had decreased colonization efficiency and 8 were fatal and thus classified as being ‘putative essential’. Due to the demonstrated ability of our system to identify known and potential virulence factors, and the fact that 61% of H. pylori genes tested in gene-knockout experiments were shown to play important roles in the organism’s biology, we conclude that it may be an important tool for novel drug and vaccine targets.

 « Back


Recent Issues in Biopathway Extraction from Literature

Jong C. Park, Korean Advanced Institute of Science & Technology


As the importance of automatically extracting and analyzing various natural language assertions about protein-protein interactions in biomedical publications is recognized, many uses of natural language processing techniques are proposed in the literature. However,most proposals to date make rather simplifying assumptions about the syntactic aspects of natural language due to various reasons including efficiency. Here, we describe an implemented system that utilizes combinatory categorical grammar known to be competent in modeling natural language, with a controlled mechanism for the parser to operate bidirectionally and incrementally. We discuss the performance of the system on a large set of abstracts in MEDLINE with encouraging results.

 « Back


Prokaryote Phylogeny Based on Complete Genomes --- A Compositional Approach

Bailin Hao, Institute of Theoretical Physics and Beijing Genomics Institute, Academia Sinica


The availability of an ever growing number of complete genome sequences of organisms raises the problem of how to infer phylogenetic relationships from these data. As different organisms have different size of gnomes, different number and order of genes, the traditional approach of molecular phylogeny based on alignment of conservative genes encounters much difficulty and has led to controversial results. In particular, the location of hyperthermophilic bacteria on the tree of life has triggered a hot debate on whether there has been massive lateral transfer of genes in the evolutionary history. We propose a new way of inferring phylogeny without doing sequence alignments. It uses K-string compositions to construct representative vectors for each species. This approach yields promising results when applied to prokaryote genomes.

 « Back

Best viewed with IE 7 and above