Vincent J. Carey, Harvard Medical School
Genomic EDA and Modeling with R/Bioconductor
pursues the creation of flexible and portable tools for statistical analysis
of genomic data. I will describe Bioconductor facilities for exploratory
data analysis and flexible statistical inference with microarray data.
Particular examples will include classification of gene expression densities,
visualization and inference on genomic network structures, and flexible
methods for testing hypotheses about the roles of pathways and pathway
components in gene expression studies.
Christopher Field, Dalhousie University
Robustness Issues in Phylogeny
To estimate the tree structure for a set of taxa, we typically use a statistical
model for evolution and compute the maximum likelihood estimate. Molecular
Biologists recognize that the model is a rough approximation to reality
and there is considerable literature on the effects of model deviations.
In this talk, I will examine some of these deviations paying particular
attention to the type of robustness methodology needed to successfully
estimate the tree and make reliable inference.
Greg Gloor, University of Western Ontario
Co-evolution and mutual information of amino acid positions in protein
Proteins are extremely complicated molecular machines that have evolved
to perform a particular cellular function. While knowing the structure
of a given protein often gives valuable insights into its function, there
are also many unanswered questions. This is because each structure is
a snapshot of one particular conformation of a protein isolated from one
individual species. In many instances functionally important amino acid
positions are conserved, but mutation
studies show that many non-conserved positions equally important. We are
using mutual information to find these important, yet variable, amino
acid positions in protein families. I will describe our progress on this
project, and present some strengths and limitations of the current generation
of tools used to show the correspondence between structure and sequence.
David Sankoff, University of Ottawa
Far-reaching effects of missing map data and local shuffling on the
inference of genome rearrangement history
Joint work with Phil Trinh. Until recently algorithms for studying
the evolution of gene order could only be applied to small genomes (mitochondria,
chloroplasts, prokaryotes), the difficulty with mammalian and other larger
eukaryotic nuclear genomes lying not so much in their much greater length
but rather in the absence of comprehensive lists of genes and their orthologs.
Pavel Pevzner and Glen Tesler (PNAS 2003) have suggested a way to bypass
gene finding and ortholog identification by using the order of syntenic
blocks constructed solely from sequence data as input to a genome rearrangement
algorithm. The method focuses on major evolutionary events by glossing
over small block-internal rearrangements, and neglecting intervening blocks
smaller than a threshold length. This use of large "sanitized"
blocks, and the neglect of short blocks may, however, blur important parts
of the historical derivation of the genomes. We model the effects of eliminating
and amalgamating short blocks, concentrating on the summary statistic
of`"breakpoint re-use" introduced by Pevzner and Tesler. They
did not conceive of this as an evolutionary distance, but in the context
of their protocol it effectively measures to what extent genomes have
diverged in becoming random permutations of blocks with respect to each
other. We use analytic and simulation methods to investigate breakpoint
re-use as a function of threshold size and of rearrangement parameters.
We discuss the implication of our findings for the comparison of mammalian
genomes and suggest a number of mathematical, algorithmic and statistical
lines for further developing the Pevzner-Tesler approach.
David Tritchler, University of Toronto
A Spectral Clustering Method for Microarray Data
Joint work with Shafagh Fallah and Joseph Beyene. Cluster analysis is
a commonly used dimension reduction technique. This talk introduces a
clustering method motivated by a multivariate analysis of variance model
and computationally based on eigenanalysis (thus the term ``spectral"
in the title). Our focus is on large problems, and we present the method
in the context of clustering genes and arrays using microarray expression
data. The computational algorithm for the method has complexity linear
in the number of genes.
Of the numerous methods for constructing clusters
from microarray data, many require that the number of clusters believed
present in the data be specified a priori, and in general judgements about
the appropriate number of clusters is problematic. We also introduce a
method for assessing the number of clusters exhibited in microarray data
based on the eigenvalues of a particular matrix.
Jean Yee Hwa Yang, University of California,
Statistical Issues in the Design of Microarray Experiments
Microarray experiments performed in many areas of biological sciences
generate large and complex multivariate datasets. This talk addresses
statistical design and analysis issues, which are essential to improve
the efficiency and reliability of cDNA microarray experiments. We discuss
various considerations unique to the design of cDNA microarrays, and examine
how different types of replication affect design decisions. We calculate
variances of two classes of estimates of differential gene expression
based on log ratios of fluorescence intensities from cDNA microarray experiments:
direct estimates, using measurements from the same slide, and indirect
estimates, using measurements from different slides. These variances are
compared and numerical estimates are obtained from a small experiment.
Some qualitative and quantitative conclusions are drawn which have potential
relevance to the design of cDNA microarray experiments.
Kenny Q Ye and Anil Dhundale, SUNY at
Pooling or not pooling in microarray experiments - an experimental
design point of view
Microarray experiments are often used to detect differences in gene
expression between two populations of cells; a test population versus
a control population. However in many cases, such as individuals in a
population, the biological variability can present changes that are irrelevant
to the question of interest and it then becomes important to assay many
individual samples to collect statistically meaningfully results. Unfortunately
the cost of performing some types of microarray experiments can be prohibitive.
A potentially effective but not well publicized alternative is to pool
individual RNA samples together for hybridization on a single microarray.
This method can dramatically reduce the experimental costs while maintaining
high power in detecting the changes in expression levels that relate to
the specific question of interest. In this talk, we will discuss why this
technique works and the optimal design strategy for pooling. This idea
will also be illustrated by a synthetic experiment and a real experiment
that studies Afib (cardiac atrial fibrillation), a condition that is a
serious health condition that affects a large percent of the population
but mechanistically remains not well understood.
Back to Top
Back to Workshop Home Page