Back
to mini-symposia index
10)
Theoretical approaches to systems biology I and II (2
minisymposia)
Principal organizers:
Dr. Brian Ingalls
(University of Waterloo) and Dr. John Parkinson (Sick Kids,
Toronto)
The
emergence of high-throughput omic technologies
is leading to the generation of vast amounts of data detailing
not only the genes and their products but also their interactions
and dynamics. Consequently, genes are no longer being studied
in isolation, but are rather being treated as entities in
highly intricate biological systems. The non-intuitive behavior
of these systems that arise through their inherent complexity
provides exciting new opportunities for mathematical biology.
The theoretical biology community is attempting to address
these needs, through the application of existing tools from
dynamical systems theory, graph theory, systems theory and
control theory.
The mini-symposia will draw together investigators from both
the theoretical and experimental sides of current systems
biology research. The diversity of the systems biology community
demands interdisciplinary meetings as forums for researchers
from disparate backgrounds to discuss the challenges and share
ideas. Given the topics addressed resonate with many other
areas in mathematical biology, we expect the proposed mini-symposia
will be of broad interest to attendees at the SMB meeting.
| Minisymposium
I: |
Minisymposium
II:
|
Quaid
Morris
(University of Toronto)
Chad Myers (University of Minnesota)
John Parkinson (Sick Kids, Toronto)
Jason Papin (University of Virginia)
|
Baltazar
Aguda (Mathematical Biosciences Institute, Columbus)
Brian Ingalls (University of Waterloo)
Santiago Schnell (Indiana University)
Vahid Shahrezaei (McGill University) |
Minisymposium
I:
Quaid
Morris
GeneMANIA: Fast integration of large-scale biological datasets
for gene function prediction
Our
goal is to help the average, computationally-naïve biologist
to take advantage of the wealth of publicly available biological
data
from large-scale experiments, like microarray profiling studies.
Toward this goal, we are developing an easy-to-use web interface
that will allow biologists to generate hypotheses about the
function of uncharacterized genes by querying a set of automatically-updated
databases derived from large-scale biological experiments. Queries
take the form of lists of genes that share a function of interest
and a list of the datasets to search. The output of the interface
is a sorted list of genes likely to share function with those
in the input list.
To generate the output of our interface, we train a classifier
whose input is a set of weighted graphs each representing a
different
biological dataset and whose training labels are based on the
input list of genes. In order for our web interface to be interactive,
we have only about ten seconds to both train our classifier
and assign discriminant values for a combined training and test
set that contains about 20,000 genes. To solve this problem,
we have developed the GeneMANIA algorithm which does label propagation
on a composite graph formed from a convex combination of the
adjacency matrices of the input graphs. Unlike previous work
(Tsuda et al, 2005), we optimize the adjacency matrix weights
before doing the classification, resulting in a speed-up of
almost two orders of magnitude without any reduction in prediction
accuracy.
John
Parkinson
Adventures beyond the Bacteriome: The evolution of functional
modularity in bacterial protein interaction networks
It
is widely appreciated that genes and proteins do not operate
in isolation, but form components of highly integrated biological
processes. Identifying the connections between these components
is therefore critical to understanding how these processes are
organized and function. /E. coli /is the leading model bacterium,
however despite its importance in biological and medical discovery,
a lack of large scale high quality interaction data has largely
precluded a global systems view of its genes and
protein products. Adopting a Bayesian framework, we have created
a single, highly reliable network that encompasses almost 50%
of the /E. coli/ proteome, validated through rigorous statistical
tests. Combining this network with a recently generated data
set of experimentally determined interactions we have systematically
organized these data into discrete functional modules to provide
a comprehensive overview of the modular organization of the
/E. coli /proteome. Functional, structural and comparative genomic
analyses are beginning to provide global insights into the relationships
of proteins from different functional classes, the evolution
of the network, the integration of laterally transferred genes
and the role of protein domains on the organization of the network.
Studies such as these are expected to pave the way for an exciting
new era of bioengineering in which synthetic biologists are
able to draw on these defined functional modules to design novel
biological constructs.
Jason
Papin
Systems Biology of Infectious Disease
Infectious
disease is a tremendous global health problem. Systems biology
promises to integrate high-throughput data in a mathematical
context to make the connection between genotype and phenotype,
and thus may facilitate the rapid identification of drug targets
for emerging pathogens. Two topics will be discussed: (1) the
development of novel computational approaches for interrogating
properties of mathematical representations of biochemical networks;
and (2) the discovery of fundamental biology with systems-level
models of human pathogens.
Delbert Dueck
Affinity
Propagation: Clustering by passing messages between data points
How
would you identify segments of DNA that reflect the expression
properties of genes? How would you selecting a small number
of yeast genes that act as a
drug-response footprint? How would you identify a subset of
vaccine sequences that provide maximum
epitope coverage for an HIV genome population?
Data centers, or exemplars, are traditionally found by randomly
choosing an initial subset of data points and then iteratively
refining it, but this only works well if that initial choice
is close to a good solution. Affinity propagation is a new algorithm
that takes as input measures of similarity between pairs of
data points and simultaneously considers all data points as
potential exemplars. Real-valued messages are exchanged between
data points until a high-quality set of exemplars and corresponding
clusters gradually emerges. We have used affinity propagation
to solve a variety of clustering problems and we found that
it uniformly found clusters with much lower error than those
found by other methods, and it did so in less than one-hundredth
the amount of time. Because of its simplicity, general applicability,
and performance, we believe affinity propagation will prove
to be of broad value in science and engineering.
Minisymposium
II
Baltazar
D. Aguda
Theoretical prediction of a microRNA-regulated oncogenic
zone between normal cell cycle and apoptosis
MicroRNAs
(miRs) are 21-23 nucleotide non-coding RNAs that regulate the
expression of their target genes either by inhibiting the translation
or reducing the stability of their messages. To date, there
are over 700 known miRs in humans [1], and it is estimated that
at least a third of all human genes are under miR regulation
[2]. Sequence-based analysis commonly predicts hundreds, sometimes
thousands, of targets of a given miR, and that a given gene
can be targeted by multiple miRs. When viewed at the level of
gene networks and at the scale of cellular pathways and processes,
the complexity of miR regulation becomes evident. And it is
likewise evident that we now have to understand this miR layer
of regulation in the development of various cancers, as perturbations
of specific miRs have recently been demonstrated to be associated
with tumorigenesis [3].
I
will discuss the regulation of the transcription factors E2F
and Myc by miR-17-92 (a polycistronic cluster of 7 miRs). miR-17-92
cluster has been labeled as oncogenic because its overexpression
is associated with increased incidence of certain tumors; however,
the same miR cluster has also been shown to act as a tumor suppressor
because its deletion or downregulation is associated with increased
frequency of other cancers [4]. I have recently developed a
mathematical model that is able to explain this paradoxical
role of miR-17-92 in cancer development. We have shown earlier
that E2F and Myc are members of a node (in a model modular network)
that controls the coordination between cell proliferation and
apoptosis [5]. In the new model, an oncogenic zone
exists in an E2F-Myc phase diagram. This zone separates normal
cell proliferation and apoptosis, and miR-17-92 fine-tunes the
dynamics of the network so that this oncogenic zone is avoided
under normal conditions.
References
[1]
See for example, miRBase : http://microrna.sanger.ac.uk/ [2]
Rajewsky N. (2006) microRNA target predictions in animals. Nat
Genet. 38 Suppl:S8-13. [3] Cho WC. (2007) OncomiRs: the discovery
and progress of microRNAs in cancers. Mol Cancer. 6:60.
[4]
Coller HA, Forman JJ, Legesse-Miller A. (2007)"Myc'ed messages":
myc induces transcription of E2F1 while inhibiting its translation
via a microRNA polycistron. PLoS Genet. 3(8):e146. [5] B.D.
AgudaBD, Algar CK (2003) Structural analysis of the qualitative
networks regulating the cell cycle and apoptosis, Cell
Cycle 2: 538-544.
Brian
Ingalls
Exploiting Stoichiometric Structure for Steady State Analysis
The
stoichiometry matrix reflects the redundancies that are typically
present in a metabolic network. It can be shown that steady-
state analysis depends only on the form of these redundancies
and so for such analysis much of the content of the stoichiometry
matrix is irrelevant. As a result, existing analytic techniques
can be streamlined. An additional consequence is the construction
of classes of networks which share the same stoichiometric properties,
including reduced versions of networks which are stoichiometrically
consistent. By reducing internal reactions in this manner while
retaining exchange fluxes, one arrives at a minimal description
of a metabolic input/output module.
Santiago
Schnell
A model of endoplasmic reticulum stress in pancreatic beta
cells
Pancreatic
beta cell failure is increasingly recognized as central to the
progression of diabetes mellitus. Different causes are implicated
in the onset of beta cell stress, dysfuction or dead. One of
these is endoplasmic reticulum (ER) stress resulting from the
misfolding of proinsulin, the insulin precursor. We have developed
a model of proinsulin maturation, misfolding and export to investigate
ER stress in beta cells. There are currently two prevailing
hypotheses in the field: ER stress can result either from deterioration
of the proinsulin folding pathways or from deterioration of
the transported proteins responsible of extracting insulin from
the ER. Our model suggest neither hypotheses alone is responsible
for ER stress, but rather they both contribute to it.
Vahid
Shahrezaei
Protein distributions under intrinsic and extrinsic fluctuations.
Gene
expression is significantly stochastic making modeling of genetic
networks challenging. This stochasticity arises because of both
inherent stochasticity in biochemistry (intrinsic fluctuations),
as well as interactions of the system of interest with other
stochastic systems in the cell or its environment (extrinsic
fluctuations). I first present an approximation that allows
the calculation of not only the steady-state mean and variance
but also the distributions of protein under only intrinsic fluctuations.
The non-Gaussian distributions derived are poorly characterized
by their mean and variance, which usually are only calculated.
Next, I discuss an extension to standard stochastic simulation
algorithm to include extrinsic fluctuations. Using the proposed
algorithm , I demonstrate that both the timescales of extrinsic
fluctuations and their non-specificity substantially affect
the function and performance of biochemical networks.
TALK CANCELLED
Chad
Myers
Smart genomic discovery through computational models for
genomic data integration
Understanding
the complexity and organization of biological systems and
how they relate to cellular function is a primary focus of
modern biology. Toward this goal, recent efforts in genomics
have focused on high-throughput measurement of several cellular
phenomena including gene expression, protein-protein interactions,
genetic interactions, protein localization and sequence. The
wealth of data generated by such studies promises to support
computational prediction of network models based on the integration
of these diverse data, which has been an intense focus of
the computational community in recent years.While several
successful approaches to the network inference problem have
been proposed, we argue that our community has had relatively
limited success in translating predictive models into testable
hypotheses that are actually validated or refuted by experimental
studies. If our computational studies are so successful, why
do we have trouble convincing biologists to use them to direct
large-scale laboratory efforts?
We explore this issue in the context of our recent research
on computational approaches to inferring networks from diverse
genomic data. We highlight a promising new direction for computational
research, which is to build predictive models that couple
predictions with specific experimental systems. We argue that
such models can make
effective use of the enormous repositories of existing genomic
data to dramatically improve the efficiency with which we
apply experimental genomic technology and, ultimately, discover
new biology. We discuss two case examples where we have used
such an approach to drive several months of experimental investigation:
one to discover nearly 100 proteins required for yeast mitochondrial
function, and another in targeting large-scale combinatorial
perturbation screens in yeast.
|
|