Institute for Mathematical Sciences                                        Programs & Activities



Workshop Registration:
- Register online


Download form:
MSWord | PDF | PS


Getting to IMS:
- Public transport
- Private Transport/Taxi


  Academic matter


Workshop on Modeling in Molecular Biology


Jointly organized by Institute for Mathematical Sciences (IMS) and Laboratories for Information Technology (LIT)

~ Abstracts ~

Bridging nonliving and living matter: step by step assembly of a proto-organism
Steen Rasmussen, Los Alamos National Lab, USA


(Joint work with L.Chen, M.Nilsson, K.Tunstroem, and S.Abe)

Life on earth and possibly elsewhere begins with a proto-organism, a simple self-replicating molecular system. A proto-organism consists of a cooperative coupling between a compartment, a metabolic reaction network), and polymers that can store and replicate encoded information. Assembling non-biological materials (geomaterials) into a proto-organism constitutes a bridge between nonliving and living matter. We review some of the experimental and theoretical work on bridging nonliving and living matter currently under way at Los Alamos, Argonne, and several Universities. We investigate simple molecular systems in the lab and in simulation that have certain basic properties of the living state, and we integrate these into complex self-assembled molecular systems that incorporate properties such as ener gy capture, metabolism, and self-replication. The set of theoretical and computational methods we use enable us to study molecular self-assembly processes on times scales from pico-seconds to minutes. We review some of the theoretical and computational results on molecular self-organization (e.g. micellation dynamics, vesicle self-reproduction, and template directed self-replication in lipid aggregate) and show the relationship between a set of thermodynamic observables in experiment, simulation, and theory. Finally we present an ansatz for generating dynamics hierarchies in formal systems, which has emerged for longer term we believe that this work helps define the foundation of new enginee ring approaches for a Living Technology based on self-organization and evolution to the benefit of fundamental biology, material and environmental science, as well as human health. Perhaps most importantly, to demonstrate how life can self-organize from its building blocks will resolve a central scientific question in the ancient puzzle about who we are and from where we come.

« Back


Modeling, Analysis and Simulation of Molecular Computation
Masami Hagiya, Department of Computer Science, Graduate School of Information Science and Technology, University of Tokyo


Molecular computing is a research field that tries to analyze the computational power of molecules and molecular interactions, and seeks its engineering applications in the areas including information technology, biotechnology and nanotechnology. Research in molecular computing begins with defining a computational model of molecular reactions, and analyzing the model from the points of computability and complexity. Along with the analysis, the problem of how to design molecules and molecular reactions for realizing the computational model efficiently is also investigated. Simulating molecular reactions by conventional computers is an important tool for this purpose.


In this lecture, I first summarize some of the computational models of molecular reactions that have been proposed and implemented in the field of DNA and molecular computing. They include:


  • the Adleman-Lipton paradigm and its refinements, including Suyama's,
  • the self-assembly of various forms of DNA including DNA tiles, which was first sought by Seeman and Winfree,
  • the splicing model proposed by Head, and
  • the models of DNA automata, including Hagiya's and Shapiro's.


I then touch upon some efforts to analyze the computational power of these models. They are roughly divided into those on computability and those on complexity. The research on complexity of molecular computation is also divided into discrete and physical. In the latter kind of research, physical properties of molecular reactions are taken into account and probabilistic analysis of reactions is made.


I also explain some techniques and tools for designing molecules and molecular reactions for implementing the above models. Sequence design is a problem that has been investigated from the beginning of DNA computing, and various techniques and tools have been developed, among which those based on genetic algorithm and those based on coding theory are widely used. Recently, estimation of the energy of secondary structures of DNA is becoming more and more addressed in sequence design.


Another kind of tool for designing molecular reactions is that of computer simulation. In order to make simulation usable for designing reactions, in particular, for tuning parameters of molecular reactions, such as temperature and salt concentration, it is important to adopt a reaction model of an appropriate abstraction level. This raises again the issue of what are good models of molecular computation. I will touch upon some efforts in the direction of simulating molecular computation.

« Back


Molecular Information Theory
Sungchul Ji, Rutgers University, USA


The nature of physical systems that have been investigated by scientists throughout history appears to have evolved in two major steps:


#1 #2
Deterministic Stochastic Informational
Statistical &
New Kind of Science (?)


Transition #1, from deterministic science of Newton to the stochastic science of Boltzmann and Heisenberg, was occasioned by the discovery of the microworld and the attendant development of statistical mechanics and quantum mechanics. Transition #2 from stochastic science to evolutionary one, was inaugurated by Darwin’s discovery of biological evolution. Both these transitions were essential in the emergence of modern biology, particularly that of molecular/cell biology. Just as statistical mechanics (SM) and quantum mechanics (QM) must be consistent with Newtonian mechanics (NM) and yet exhibit novel features not found in NM (i.e., uncertainty), so it seems that biology must be consistent with not only NM but also SM and QM and yet can exhibit novel features, here identified with information. The basic science dealing with uncertainty is probability theory. The science concerned with information is a branch of probability theory known as information theory.


For the purpose of this presentation, information will be defined as the ability to reduce uncertainty, just as energy is defined as the ability to do work. As is usual in information theory, I will express uncertainty in terms of Shannon entropy H. When all events involved have equal probability of occurrences, the Shannon equation assumes a simple form, H = log2 n bits, where n is the total number of possible events. The amount of information, I, that is required to reduce uncertainty from Hinital to Hfinal can be computed as I (Hinital ® Hfinal) = log2 (n0/n) bits , where n0 is the number of events out of which information I enabled the selection of n events.


Applying the above definition of information to molecular biology, we can define molecular information, Im, as the ability of a molecular system (e.g., enzyme, the cell, etc.) to select n out of n0 possible molecular events, states, or processes: Im = log2 (n0/n) bits. We can interpret this equation as indicating that Im bits of information will enable a molecular system to make n correct selections out of n0 possible choices (if requisite free energy is provided from some free energy source).

As indicated above, Im, alone cannot not make selection. That is, Im is necessary but not sufficient to drive a selection process. Therefore, the molecular system utilizing the information must be able to provide the free energy required for the selection process. Otherwise, the system will end up violating the laws of thermodynamics. Another deficiency of the above definition of information is that it addresses only the quantitative aspect of information and ignores the meaning and the value of information, the two aspects of molecular information that are crucial in molecular biology. The necessity of taking these two additional aspects of molecular information in biology will be discussed in the context of the Bhopalator, a molecular model of the living cell formulated in 1983.

« Back


The Conformon Theory of Molecular Machines
Sungchul Ji, Rutgers University, USA


Molecular machines such as enzymes, ion pimps, and molecular motors are products of biological evolution and hence “carry” molecular information as defined in Molecular Information Theory. Therefore, it can be anticipated that the behaviors of molecular machines cannot be completely accounted for, or understood, in terms of the laws of physics and chemistry alone but only in terms of BOTH the laws of physics and chemistry AND the rules forged by biological evolution. What distinguishes the laws of physics and chemistry and the rules of biology is the inexorability of the former and the arbitrariness of the latter. As a result, molecular machines will exhibit behaviors that appear inexorable or arbitrary, depending on the mode of observation.


One of the inexorable aspects of molecular machines is that its direction of operation (e.g., the Na+/K+ ATPase moving sodium out of or into the cell) is completely determined by the sign of the accompanying Gibbs free energy change, always operating in the direction of decreasing this form of free energy (under constant temperature and pressure). One of the arbitrary aspects of a molecular machine is the relation between binding free energy and the direction of catalysis, either positive (i.e., rate enhancement) or negative (i.e., rate inhibition). In other words, the same amount of the binding free energy engendered by the interaction between a substrate and its enzyme catalytic site can be used to either decrease or increase the activation free energy barrier for the chemical reaction being catalyzed, depending totally on the nature of mechanical interactions between the substrate and the catalytic site of the enzyme, which in turn depending on the amino acid sequence information of both the catalytic site and the rest of the enzyme. Thus what drives catalysis is not the binding free energy alone as has been advocated by W. Jencks and many others (e.g., Krupka, Hill, Eisenberg, Astumian, etc.) but also the genetic information encoded in the shape (i.e., the 3-dimensional structure) of an enzyme. In fact, it may be asserted that the shapes of enzymes are primary in the phenomenon of enzyme catalysis in that shapes carry not only genetic information but also free energy (e.g., energized myosin head), thus their changes being able to drive catalytic act, including translocation and transformation of bound ligands. The combination of free energy and genetic information in the form of sequence-specific conformational strains of enzymes (local energized shapes, LES) (and other biopolymers such as DNA) was given the name conformons in 1972. During the past three decades, conformons (LES) have been found to provide the necessary and sufficient conditions to account for many goal-directed molecular processes inside the cell [Ji, S., BioSystems 54: 107-130 (2000)].


The conformon theory will be applied to the mechanisms of action of the glucose carrier and the Na+/K+ ATPase. A preliminary attempt will be made to represent the conformon model of the Na+/K+ ATPase using the stochastic p-calculus described by G. Ciobanu [2000, 2001, 2002], in order to establish a fundamental link between molecular biology and computer science. Such a fundamental link was suggested by the isomorphism postulated to exist between the cell and the cellular automata in 1991, which appears to be in agreement with the (weak version of the) Principle of Computational Equivalence recently formulated by S. Wolfram in A New Kind of Science [Wolfram Media, Inc., Urbana-Champaign, 2002].

« Back


Exploiting Conserved Synteny in Genome-by-Genome Ortholog Mapping
Phil Long, Genome Institute of Singapore

(Joint work with K.R.K. Murthy, V. Vega and E. Liu)


Pairing genes in a lower organism with their equivalent counterparts in humans (their orthologs) is an important step in investigation of the molecular basis of human disease. The most prominent high-throughput, fully automated methods for ortholog pairing predict whether a pair of genes are orthologs based on the similarity between DNA or RNA sequences associated with them, possibly supplemented by comparison with a sequence associated with a related gene in a third organism.


Rearrangements of genomes during evolution often leave long stretches of DNA intact. Thus, neighboring genes often travel together through evolution, possibly undergoing mutations along the way (when this happens, it is called conserved synteny). As a result, when assessing whether a pair of genes are orthologs, supporting evidence can be obtained by examining whether genes nearby on their respective chromosomes have similar sequences.

We describe a method for incorporating the evidence due to conserved synteny into a high-throughput system for predicting which pairs of genes in two genomes are orthologs. We provide evidence using the human and mouse genomes that using conserved synteny in an ortholog pairing system as we propose results in substantial improvement in accuracy.

« Back


Endogenous fluctuations in gene regulation
Thomas Kepler, Santa Fe Institute, USA


The regulation of gene expression plays a fundamental role in the dynamics of cellular life. These processes are subject to significant stochasticity due partly to the random waiting times among synthesis and degradation reactions involving a finite collection of transcripts. Additional stochasticity is attributable to the random transitions among the discrete operator states controlling the rate of transcription. This innate stochasticity can have quantitative and qualitative impact on the behavior of gene-regulatory networks. We develop a Markov model to which these random reactions are intrinsic as well as a series of simpler models derived explicitly from the first as approximations in different parameter regimes. For their analysis, we introduce a natural generalization of deterministic bifurcations for classification of stochastic systems. We show that simple noisy genetic switches have rich bifurcation structures; among them, bifurcations driven solely by changing the rate of operator fluctuations even as the ``underlying'' deterministic system remains unchanged. We find stochastic bistability where the deterministic equations predict monostability and vice-versa. We derive and solve equations for the mean waiting times for spontaneous transitions between quasistable states in these switches.

« Back


Modeling and inference with random processes and the minimum description length criterion
Tom Kepler, Santa Fe Institute, USA


In the statistical analysis of data, the variability of the data is partitioned into regularity and randomness. That variability classified as regularity then becomes the object of further inquiry-it is the part that must be explained. That classified as random is discarded, since there is, by definition, nothing to explain. In practice, this partitioning occurs relative to a particular model; the regularities are embodied in advance by the model and the residuals are whatever remains unexplained by the model. In biology, we are often now faced with the analysis of large, structured datasets. It is more often the case than not that the form of the model cannot be discerned in advance. Nevertheless, the partitioning of variability remains a critical step in the discovery of patterns as well as in generalized model comparison and hypothesis testing.


Minimum description length (MDL) techniques provide an information-theoretic framework for model comparison and data analysis based on the minimization of total description length for the model and residuals together rather than on null-hypothesis significance testing. I will describe our efforts to extend the MDL method to Gaussian process models, for which the models themselves are random, and are estimated essentially by Bayes' rule. I will provide examples of the use of this modeling technique for the analysis of DNA sequence data and other biological systems.

« Back


Process algebra and model checking in molecular biology
Gabriel Ciobanu, National University of Singapore


This talk presents a discrete mathematical description of the cellular process of sodium-potassium exchange pump in terms of the pi-calculus process algebra. The equations of Albert-Post model are translated into an appropriate operational semantic which can describe both protein interactions (conformational transformations) and membrane transportation occurring in the pump mechanism. In this way a computational model is obtained, whose propriety can be automatically checked. We motivate the use of the pi-calculus as an adequate formalism for molecular processes by describing the dynamics of the sodium-potassium exchange pump, an important physiologic process present in all animal cells. This molecular process have to concern with phenomena related to distribution, cooperation, but with mobility and adaptability as well. Using the stochastic pi-calculus, we describe the molecular interactions and conformational transformations in an explicit way. We manipulate formally the changing conformations and describe the corresponding dynamic systems using discrete mathematics instead of the usual partial differential equations. The transfer mechanisms are described in more details, step by step. Moreover, we can use some software tools to verify properties of the described systems.

« Back


Discrete models of the immune system
Santo Motta Dept. Mathematics and Computer Science University of Catania Italy


The first part of the talk presents a brief review the mathematical framework of Immune System models pointing out advantages and disadvantages of the two major approaches, namely continuous and discrete models. The second part introduces the concept and characteristics of cellular automata. The third part presents the main characteristic of the Celada-Seiden model of the Immune System. This model based on cellular automata is very rich in biological details. Finally in the last part is shown how one can use an Immune System model to perform pattern recognition based on the Immune algorithm.

« Back


A new kind of science
Stephen Wolfram, Wolfram Research, Inc., USA


The recent release of Stephen Wolfram's book A NEW KIND OF SCIENCE ( has created an immense wave of interest in the new intellectual structure that he presents. The book has been available to the scientific public for only a little more than a month now, and so the process of careful examination and application of Wolfram's work is in its early stages. In this talk Stephen Wolfram will provide an overview of the key ideas and discoveries in NKS and discuss opportunities and directions of relevance to the community working on mathematical models of biological systems and processes. This community is well poised to be an early adopter of the research direction set out by Wolfram's work. The talk will end with a question and answer session with Dr. Wolfram.

« Back


Best viewed with IE 7 and above