# Institute for Mathematical Sciences Programs & Activities

**Workshop on Modeling in Molecular Biology **

**Jointly organized
by Institute for Mathematical Sciences (IMS) and Laboratories
for Information Technology (LIT)**

**~ Abstracts ~**

**Bridging nonliving and living
matter: step by step assembly of a proto-organism**

*Steen Rasmussen, Los Alamos National Lab, USA*

(Joint work with L.Chen, M.Nilsson, K.Tunstroem, and S.Abe)

Life on earth and possibly elsewhere begins with a proto-organism, a simple self-replicating molecular system. A proto-organism consists of a cooperative coupling between a compartment, a metabolic reaction network), and polymers that can store and replicate encoded information. Assembling non-biological materials (geomaterials) into a proto-organism constitutes a bridge between nonliving and living matter. We review some of the experimental and theoretical work on bridging nonliving and living matter currently under way at Los Alamos, Argonne, and several Universities. We investigate simple molecular systems in the lab and in simulation that have certain basic properties of the living state, and we integrate these into complex self-assembled molecular systems that incorporate properties such as ener gy capture, metabolism, and self-replication. The set of theoretical and computational methods we use enable us to study molecular self-assembly processes on times scales from pico-seconds to minutes. We review some of the theoretical and computational results on molecular self-organization (e.g. micellation dynamics, vesicle self-reproduction, and template directed self-replication in lipid aggregate) and show the relationship between a set of thermodynamic observables in experiment, simulation, and theory. Finally we present an ansatz for generating dynamics hierarchies in formal systems, which has emerged for longer term we believe that this work helps define the foundation of new enginee ring approaches for a Living Technology based on self-organization and evolution to the benefit of fundamental biology, material and environmental science, as well as human health. Perhaps most importantly, to demonstrate how life can self-organize from its building blocks will resolve a central scientific question in the ancient puzzle about who we are and from where we come.

**Modeling, Analysis and Simulation
of Molecular Computation**

*Masami Hagiya, Department of Computer Science,
Graduate School of Information Science and Technology,
University of Tokyo*

Molecular computing is a research field that tries to analyze the computational power of molecules and molecular interactions, and seeks its engineering applications in the areas including information technology, biotechnology and nanotechnology. Research in molecular computing begins with defining a computational model of molecular reactions, and analyzing the model from the points of computability and complexity. Along with the analysis, the problem of how to design molecules and molecular reactions for realizing the computational model efficiently is also investigated. Simulating molecular reactions by conventional computers is an important tool for this purpose.

In this lecture, I first summarize some of the computational models of molecular reactions that have been proposed and implemented in the field of DNA and molecular computing. They include:

- the Adleman-Lipton paradigm and its refinements, including Suyama's,
- the self-assembly of various forms of DNA including DNA tiles, which was first sought by Seeman and Winfree,
- the splicing model proposed by Head, and
- the models of DNA automata, including Hagiya's and Shapiro's.

I then touch upon some efforts to analyze the computational power of these models. They are roughly divided into those on computability and those on complexity. The research on complexity of molecular computation is also divided into discrete and physical. In the latter kind of research, physical properties of molecular reactions are taken into account and probabilistic analysis of reactions is made.

I also explain some techniques and tools for designing molecules and molecular reactions for implementing the above models. Sequence design is a problem that has been investigated from the beginning of DNA computing, and various techniques and tools have been developed, among which those based on genetic algorithm and those based on coding theory are widely used. Recently, estimation of the energy of secondary structures of DNA is becoming more and more addressed in sequence design.

Another kind of tool for designing molecular reactions is that of computer simulation. In order to make simulation usable for designing reactions, in particular, for tuning parameters of molecular reactions, such as temperature and salt concentration, it is important to adopt a reaction model of an appropriate abstraction level. This raises again the issue of what are good models of molecular computation. I will touch upon some efforts in the direction of simulating molecular computation.

**Molecular Information Theory**

*Sungchul Ji, Rutgers University, USA*

The nature of physical systems that have been investigated by scientists throughout history appears to have evolved in two major steps:

#1 | #2 | |||

Deterministic | Stochastic | Informational | ||

NewtonianMechanics |
Statistical &Quantum Mechanics |
BiologyWolfram’s New Kind of Science (?) |

Transition #1, from deterministic science of Newton to the
stochastic science of Boltzmann and Heisenberg, was occasioned
by the discovery of the microworld and the attendant
development of statistical mechanics and quantum mechanics.
Transition #2 from stochastic science to evolutionary one, was
inaugurated by Darwin’s discovery of biological evolution.
Both these transitions were essential in the emergence of
modern biology, particularly that of molecular/cell biology.
Just as statistical mechanics (SM) and quantum mechanics (QM)
must be consistent with Newtonian mechanics (NM) and yet
exhibit novel features not found in NM (i.e., *uncertainty*),
so it seems that biology must be consistent with not only NM
but also SM and QM and yet can exhibit novel features, here
identified with *information*. The basic science dealing
with uncertainty is *probability theory*. The science
concerned with information is a branch of probability theory
known as *information theory*.

For the purpose of this presentation, information will be
defined as *the ability to reduce uncertainty*, just as
energy is defined as *the ability to do work*. As is
usual in information theory, I will express uncertainty in
terms of Shannon entropy H. When all events involved have
equal probability of occurrences, the Shannon equation assumes
a simple form, H = log_{2} n bits, where n is the
total number of possible events. The amount of information, I,
that is required to reduce uncertainty from Hinital to Hfinal
can be computed as I (H_{inital} ® H_{final})
= log_{2} (n0/n) bits , where n0 is the number of
events out of which information I enabled the selection of n
events.

Applying the above definition of information to molecular
biology, we can define *molecular information*, I_{m},
as the ability of a molecular system (e.g., enzyme, the cell,
etc.) to select n out of n0 possible molecular events, states,
or processes: I_{m} = log_{2} (n0/n) bits. We
can interpret this equation as indicating that Im bits of
information will enable a molecular system to make n correct
selections out of n0 possible choices (if requisite free
energy is provided from some free energy source).

As indicated above, I_{m}, alone cannot not make
selection. That is, I_{m} is necessary but not
sufficient to drive a selection process. Therefore, the
molecular system utilizing the information must be able to
provide the free energy required for the selection process.
Otherwise, the system will end up violating the laws of
thermodynamics. Another deficiency of the above definition of
information is that it addresses only the quantitative aspect
of information and ignores the *meaning* and the *value *of information, the two aspects of molecular information
that are crucial in molecular biology. The necessity of taking
these two additional aspects of molecular information in
biology will be discussed in the context of *the Bhopalator*,
a molecular model of the living cell formulated in 1983.

**The Conformon Theory of Molecular
Machines**

*Sungchul Ji, Rutgers University, USA*

Molecular machines such as enzymes, ion pimps, and
molecular motors are products of biological evolution and
hence “carry” molecular information as defined in *Molecular
Information Theory*. Therefore, it can be anticipated that
the behaviors of molecular machines cannot be completely
accounted for, or understood, in terms of the laws of physics
and chemistry alone but only in terms of BOTH the *laws* of physics and chemistry AND the *rules* forged by
biological evolution. What distinguishes the laws of physics
and chemistry and the rules of biology is the *inexorability *of the former and the *arbitrariness *of the latter.
As a result, molecular machines will exhibit behaviors that
appear *inexorable *or *arbitrary*, depending on the
mode of observation.

One of the inexorable aspects of molecular machines is that
its direction of operation (e.g., the Na^{+}/K^{+} ATPase
moving sodium out of or into the cell) is completely
determined by the sign of the accompanying Gibbs free energy
change, always operating in the direction of decreasing this
form of free energy (under constant temperature and pressure).
One of the *arbitrary* aspects of a molecular machine is
the relation between binding free energy and the direction of
catalysis, either positive (i.e., rate enhancement) or
negative (i.e., rate inhibition). In other words, the same
amount of the binding free energy engendered by the
interaction between a substrate and its enzyme catalytic site
can be used to either decrease or increase the activation free
energy barrier for the chemical reaction being catalyzed,
depending totally on the nature of mechanical interactions
between the substrate and the catalytic site of the enzyme,
which in turn depending on the amino acid sequence information
of both the catalytic site and the rest of the enzyme. Thus
what drives catalysis is not the binding free energy alone as
has been advocated by W. Jencks and many others (e.g., Krupka,
Hill, Eisenberg, Astumian, etc.) but also the *genetic
information* encoded in the shape (i.e., the 3-dimensional
structure) of an enzyme. In fact, it may be asserted that the
shapes of enzymes are primary in the phenomenon of enzyme
catalysis in that shapes carry not only genetic information
but also free energy (e.g., energized myosin head), thus their
changes being able to drive catalytic act, including
translocation and transformation of bound ligands. The
combination of *free energy* and *genetic information* in the form of *sequence-specific conformational strains of
enzymes* (*local energized shapes, LES*) (and other
biopolymers such as DNA) was given the name *conformons* in 1972. During the past three decades, conformons (*LES*)
have been found to provide the necessary and sufficient
conditions to account for many goal-directed molecular
processes inside the cell [Ji, S., BioSystems **54**:
107-130 (2000)].

The *conformon* theory will be applied to the
mechanisms of action of the glucose carrier and the Na^{+}/K^{+} ATPase. A preliminary attempt will be made to represent the
conformon model of the Na^{+}/K^{+} ATPase using the *stochastic p-calculus* described by G. Ciobanu [2000, 2001, 2002], in order to
establish a fundamental link between *molecular biology *and *computer science*. Such a fundamental link was suggested
by the isomorphism postulated to exist between the cell and
the cellular automata in 1991, which appears to be in
agreement with the (weak version of the) *Principle of
Computational Equivalence* recently formulated by S.
Wolfram in *A New Kind of Science* [Wolfram Media, Inc.,
Urbana-Champaign, 2002].

**Exploiting Conserved Synteny in
Genome-by-Genome Ortholog Mapping**

*Phil Long, Genome Institute of Singapore*

(Joint work with K.R.K. Murthy, V. Vega and E. Liu)

Pairing genes in a lower organism with their equivalent counterparts in humans (their orthologs) is an important step in investigation of the molecular basis of human disease. The most prominent high-throughput, fully automated methods for ortholog pairing predict whether a pair of genes are orthologs based on the similarity between DNA or RNA sequences associated with them, possibly supplemented by comparison with a sequence associated with a related gene in a third organism.

Rearrangements of genomes during evolution often leave long stretches of DNA intact. Thus, neighboring genes often travel together through evolution, possibly undergoing mutations along the way (when this happens, it is called conserved synteny). As a result, when assessing whether a pair of genes are orthologs, supporting evidence can be obtained by examining whether genes nearby on their respective chromosomes have similar sequences.

We describe a method for incorporating the evidence due to conserved synteny into a high-throughput system for predicting which pairs of genes in two genomes are orthologs. We provide evidence using the human and mouse genomes that using conserved synteny in an ortholog pairing system as we propose results in substantial improvement in accuracy.

**Endogenous fluctuations in gene
regulation**

*Thomas Kepler, Santa Fe Institute, USA*

The regulation of gene expression plays a fundamental role in the dynamics of cellular life. These processes are subject to significant stochasticity due partly to the random waiting times among synthesis and degradation reactions involving a finite collection of transcripts. Additional stochasticity is attributable to the random transitions among the discrete operator states controlling the rate of transcription. This innate stochasticity can have quantitative and qualitative impact on the behavior of gene-regulatory networks. We develop a Markov model to which these random reactions are intrinsic as well as a series of simpler models derived explicitly from the first as approximations in different parameter regimes. For their analysis, we introduce a natural generalization of deterministic bifurcations for classification of stochastic systems. We show that simple noisy genetic switches have rich bifurcation structures; among them, bifurcations driven solely by changing the rate of operator fluctuations even as the ``underlying'' deterministic system remains unchanged. We find stochastic bistability where the deterministic equations predict monostability and vice-versa. We derive and solve equations for the mean waiting times for spontaneous transitions between quasistable states in these switches.

**Modeling and inference with random
processes and the minimum description length criterion**

*Tom Kepler, Santa Fe Institute, USA*

In the statistical analysis of data, the variability of the data is partitioned into regularity and randomness. That variability classified as regularity then becomes the object of further inquiry-it is the part that must be explained. That classified as random is discarded, since there is, by definition, nothing to explain. In practice, this partitioning occurs relative to a particular model; the regularities are embodied in advance by the model and the residuals are whatever remains unexplained by the model. In biology, we are often now faced with the analysis of large, structured datasets. It is more often the case than not that the form of the model cannot be discerned in advance. Nevertheless, the partitioning of variability remains a critical step in the discovery of patterns as well as in generalized model comparison and hypothesis testing.

Minimum description length (MDL) techniques provide an information-theoretic framework for model comparison and data analysis based on the minimization of total description length for the model and residuals together rather than on null-hypothesis significance testing. I will describe our efforts to extend the MDL method to Gaussian process models, for which the models themselves are random, and are estimated essentially by Bayes' rule. I will provide examples of the use of this modeling technique for the analysis of DNA sequence data and other biological systems.

**Process algebra and model checking
in molecular biology**

*Gabriel Ciobanu, National University of Singapore*

This talk presents a discrete mathematical description of the cellular process of sodium-potassium exchange pump in terms of the pi-calculus process algebra. The equations of Albert-Post model are translated into an appropriate operational semantic which can describe both protein interactions (conformational transformations) and membrane transportation occurring in the pump mechanism. In this way a computational model is obtained, whose propriety can be automatically checked. We motivate the use of the pi-calculus as an adequate formalism for molecular processes by describing the dynamics of the sodium-potassium exchange pump, an important physiologic process present in all animal cells. This molecular process have to concern with phenomena related to distribution, cooperation, but with mobility and adaptability as well. Using the stochastic pi-calculus, we describe the molecular interactions and conformational transformations in an explicit way. We manipulate formally the changing conformations and describe the corresponding dynamic systems using discrete mathematics instead of the usual partial differential equations. The transfer mechanisms are described in more details, step by step. Moreover, we can use some software tools to verify properties of the described systems.

**Discrete models of the immune system**

*Santo Motta Dept. Mathematics and Computer Science
University of Catania Italy*

The first part of the talk presents a brief review the mathematical framework of Immune System models pointing out advantages and disadvantages of the two major approaches, namely continuous and discrete models. The second part introduces the concept and characteristics of cellular automata. The third part presents the main characteristic of the Celada-Seiden model of the Immune System. This model based on cellular automata is very rich in biological details. Finally in the last part is shown how one can use an Immune System model to perform pattern recognition based on the Immune algorithm.

**A new kind of science**

*Stephen Wolfram, Wolfram Research, Inc., USA*

The recent release of Stephen Wolfram's book A NEW KIND OF SCIENCE (http://www.wolframscience.com/nks/) has created an immense wave of interest in the new intellectual structure that he presents. The book has been available to the scientific public for only a little more than a month now, and so the process of careful examination and application of Wolfram's work is in its early stages. In this talk Stephen Wolfram will provide an overview of the key ideas and discoveries in NKS and discuss opportunities and directions of relevance to the community working on mathematical models of biological systems and processes. This community is well poised to be an early adopter of the research direction set out by Wolfram's work. The talk will end with a question and answer session with Dr. Wolfram.