July 30-August 2, 2008
Society for Mathematical Biology Conference

hosted by the Centre for Mathematical Medicine, Fields Institute
held at University of Toronto, Medical Sciences Bldg


Back to mini-symposia index

10) Theoretical approaches to systems biology I and II (2 minisymposia)
Principal organizers:
Dr. Brian Ingalls
(University of Waterloo) and Dr. John Parkinson (Sick Kids, Toronto)

The emergence of high-throughput ‘omic’ technologies is leading to the generation of vast amounts of data detailing not only the genes and their products but also their interactions and dynamics. Consequently, genes are no longer being studied in isolation, but are rather being treated as entities in highly intricate biological systems. The non-intuitive behavior of these systems that arise through their inherent complexity provides exciting new opportunities for mathematical biology. The theoretical biology community is attempting to address these needs, through the application of existing tools from dynamical systems theory, graph theory, systems theory and control theory.

The mini-symposia will draw together investigators from both the theoretical and experimental sides of current systems biology research. The diversity of the systems biology community demands interdisciplinary meetings as forums for researchers from disparate backgrounds to discuss the challenges and share ideas. Given the topics addressed resonate with many other areas in mathematical biology, we expect the proposed mini-symposia will be of broad interest to attendees at the SMB meeting.

Minisymposium I:

Minisymposium II:

Quaid Morris (University of Toronto)
Chad Myers (University of Minnesota)
John Parkinson (Sick Kids, Toronto)
Jason Papin (University of Virginia)

Baltazar Aguda (Mathematical Biosciences Institute, Columbus)
Brian Ingalls (University of Waterloo)
Santiago Schnell (Indiana University)
Vahid Shahrezaei (McGill University)

Minisymposium I:

Quaid Morris
GeneMANIA: Fast integration of large-scale biological datasets for gene function prediction

Our goal is to help the average, computationally-naïve biologist to take advantage of the wealth of publicly available biological data
from large-scale experiments, like microarray profiling studies. Toward this goal, we are developing an easy-to-use web interface that will allow biologists to generate hypotheses about the function of uncharacterized genes by querying a set of automatically-updated databases derived from large-scale biological experiments. Queries take the form of lists of genes that share a function of interest and a list of the datasets to search. The output of the interface is a sorted list of genes likely to share function with those in the input list.

To generate the output of our interface, we train a classifier whose input is a set of weighted graphs each representing a different
biological dataset and whose training labels are based on the input list of genes. In order for our web interface to be interactive, we have only about ten seconds to both train our classifier and assign discriminant values for a combined training and test set that contains about 20,000 genes. To solve this problem, we have developed the GeneMANIA algorithm which does label propagation on a composite graph formed from a convex combination of the adjacency matrices of the input graphs. Unlike previous work (Tsuda et al, 2005), we optimize the adjacency matrix weights before doing the classification, resulting in a speed-up of almost two orders of magnitude without any reduction in prediction accuracy.

John Parkinson
Adventures beyond the Bacteriome: The evolution of functional modularity in bacterial protein interaction networks

It is widely appreciated that genes and proteins do not operate in isolation, but form components of highly integrated biological processes. Identifying the connections between these components is therefore critical to understanding how these processes are organized and function. /E. coli /is the leading model bacterium, however despite its importance in biological and medical discovery, a lack of large scale high quality interaction data has largely precluded a global ‘systems’ view of its genes and protein products. Adopting a Bayesian framework, we have created a single, highly reliable network that encompasses almost 50% of the /E. coli/ proteome, validated through rigorous statistical tests. Combining this network with a recently generated data set of experimentally determined interactions we have systematically organized these data into discrete functional modules to provide a comprehensive overview of the modular organization of the /E. coli /proteome. Functional, structural and comparative genomic analyses are beginning to provide global insights into the relationships of proteins from different functional classes, the evolution of the network, the integration of laterally transferred genes and the role of protein domains on the organization of the network. Studies such as these are expected to pave the way for an exciting new era of bioengineering in which synthetic biologists are able to draw on these defined functional modules to design novel biological constructs.

Jason Papin
Systems Biology of Infectious Disease

Infectious disease is a tremendous global health problem. Systems biology promises to integrate high-throughput data in a mathematical context to make the connection between genotype and phenotype, and thus may facilitate the rapid identification of drug targets for emerging pathogens. Two topics will be discussed: (1) the development of novel computational approaches for interrogating properties of mathematical representations of biochemical networks; and (2) the discovery of fundamental biology with systems-level models of human pathogens.

Delbert Dueck
Affinity Propagation: Clustering by passing messages between data points

How would you identify segments of DNA that reflect the expression
properties of genes? How would you selecting a small number of yeast genes that act as a
drug-response footprint? How would you identify a subset of vaccine sequences that provide maximum
epitope coverage for an HIV genome population?
Data centers, or exemplars, are traditionally found by randomly choosing an initial subset of data points and then iteratively refining it, but this only works well if that initial choice is close to a good solution. Affinity propagation is a new algorithm that takes as input measures of similarity between pairs of data points and simultaneously considers all data points as potential exemplars. Real-valued messages are exchanged between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We have used affinity propagation to solve a variety of clustering problems and we found that it uniformly found clusters with much lower error than those found by other methods, and it did so in less than one-hundredth the amount of time. Because of its simplicity, general applicability, and performance, we believe affinity propagation will prove to be of broad value in science and engineering.

Minisymposium II

Baltazar D. Aguda
Theoretical prediction of a microRNA-regulated ‘oncogenic zone’ between normal cell cycle and apoptosis

MicroRNAs (miRs) are 21-23 nucleotide non-coding RNAs that regulate the expression of their target genes either by inhibiting the translation or reducing the stability of their messages. To date, there are over 700 known miRs in humans [1], and it is estimated that at least a third of all human genes are under miR regulation [2]. Sequence-based analysis commonly predicts hundreds, sometimes thousands, of targets of a given miR, and that a given gene can be targeted by multiple miRs. When viewed at the level of gene networks and at the scale of cellular pathways and processes, the complexity of miR regulation becomes evident. And it is likewise evident that we now have to understand this miR layer of regulation in the development of various cancers, as perturbations of specific miRs have recently been demonstrated to be associated with tumorigenesis [3].

I will discuss the regulation of the transcription factors E2F and Myc by miR-17-92 (a polycistronic cluster of 7 miRs). miR-17-92 cluster has been labeled as oncogenic because its overexpression is associated with increased incidence of certain tumors; however, the same miR cluster has also been shown to act as a tumor suppressor because its deletion or downregulation is associated with increased frequency of other cancers [4]. I have recently developed a mathematical model that is able to explain this paradoxical role of miR-17-92 in cancer development. We have shown earlier that E2F and Myc are members of a node (in a model modular network) that controls the coordination between cell proliferation and apoptosis [5]. In the new model, an ‘oncogenic zone’ exists in an E2F-Myc phase diagram. This zone separates normal cell proliferation and apoptosis, and miR-17-92 fine-tunes the dynamics of the network so that this oncogenic zone is avoided under normal conditions.


[1] See for example, miRBase : http://microrna.sanger.ac.uk/ [2] Rajewsky N. (2006) microRNA target predictions in animals. Nat Genet. 38 Suppl:S8-13. [3] Cho WC. (2007) OncomiRs: the discovery and progress of microRNAs in cancers. Mol Cancer. 6:60.

[4] Coller HA, Forman JJ, Legesse-Miller A. (2007)"Myc'ed messages": myc induces transcription of E2F1 while inhibiting its translation via a microRNA polycistron. PLoS Genet. 3(8):e146. [5] B.D. AgudaBD, Algar CK (2003) “Structural analysis of the qualitative networks regulating the cell cycle and apoptosis”, Cell Cycle 2: 538-544.

Brian Ingalls
Exploiting Stoichiometric Structure for Steady State Analysis

The stoichiometry matrix reflects the redundancies that are typically present in a metabolic network. It can be shown that steady- state analysis depends only on the form of these redundancies and so for such analysis much of the content of the stoichiometry matrix is irrelevant. As a result, existing analytic techniques can be streamlined. An additional consequence is the construction of classes of networks which share the same stoichiometric properties, including reduced versions of networks which are stoichiometrically consistent. By reducing internal reactions in this manner while retaining exchange fluxes, one arrives at a minimal description of a metabolic input/output module.

Santiago Schnell
A model of endoplasmic reticulum stress in pancreatic beta cells

Pancreatic beta cell failure is increasingly recognized as central to the progression of diabetes mellitus. Different causes are implicated in the onset of beta cell stress, dysfuction or dead. One of these is endoplasmic reticulum (ER) stress resulting from the misfolding of proinsulin, the insulin precursor. We have developed a model of proinsulin maturation, misfolding and export to investigate ER stress in beta cells. There are currently two prevailing hypotheses in the field: ER stress can result either from deterioration of the proinsulin folding pathways or from deterioration of the transported proteins responsible of extracting insulin from the ER. Our model suggest neither hypotheses alone is responsible for ER stress, but rather they both contribute to it.

Vahid Shahrezaei
Protein distributions under intrinsic and extrinsic fluctuations.

Gene expression is significantly stochastic making modeling of genetic networks challenging. This stochasticity arises because of both inherent stochasticity in biochemistry (intrinsic fluctuations), as well as interactions of the system of interest with other stochastic systems in the cell or its environment (extrinsic fluctuations). I first present an approximation that allows the calculation of not only the steady-state mean and variance but also the distributions of protein under only intrinsic fluctuations. The non-Gaussian distributions derived are poorly characterized by their mean and variance, which usually are only calculated. Next, I discuss an extension to standard stochastic simulation algorithm to include extrinsic fluctuations. Using the proposed algorithm , I demonstrate that both the timescales of extrinsic fluctuations and their non-specificity substantially affect the function and performance of biochemical networks.

Chad Myers
Smart genomic discovery through computational models for genomic data integration

Understanding the complexity and organization of biological systems and how they relate to cellular function is a primary focus of modern biology. Toward this goal, recent efforts in genomics have focused on high-throughput measurement of several cellular phenomena including gene expression, protein-protein interactions, genetic interactions, protein localization and sequence. The wealth of data generated by such studies promises to support computational prediction of network models based on the integration of these diverse data, which has been an intense focus of the computational community in recent years.While several successful approaches to the network inference problem have been proposed, we argue that our community has had relatively limited success in translating predictive models into testable hypotheses that are actually validated or refuted by experimental studies. If our computational studies are so successful, why do we have trouble convincing biologists to use them to direct large-scale laboratory efforts?
We explore this issue in the context of our recent research on computational approaches to inferring networks from diverse genomic data. We highlight a promising new direction for computational research, which is to build predictive models that couple predictions with specific experimental systems. We argue that such models can make
effective use of the enormous repositories of existing genomic data to dramatically improve the efficiency with which we apply experimental genomic technology and, ultimately, discover new biology. We discuss two case examples where we have used such an approach to drive several months of experimental investigation: one to discover nearly 100 proteins required for yeast mitochondrial function, and another in targeting large-scale combinatorial perturbation screens in yeast.


Back to top