
Workshop in Honour of David F. Andrews  May 23 and 24, 2002
Short Course in Microarray Data Analysis  May 25, 2002
Talk Audios
Back to Index
Speaker Abstracts

David R. Brillinger
University of California, Berkeley
An Exploratory Data Analyisis (EDA) of the Paths of Moving
Animals
This work presents an EDA of the trajectories of deer and elk
moving about in the Starkey Experimental Forest and Range in eastern
Oregon. The animals' movements may be affected by habitat variables
and the behavior of the other animals. In the work of this paper
a stochastic differential equation based model is developed in
successive stages. Equations of motion are set down motivated
by corresponding equations of physics. Functional parameters appearing
in the equations are estimated nonparametrically and plots of
vector fields of animal movements are prepared. Residuals are
used to look for interactions amongst the movements of the animals.
There are exploratory analyses of various sorts. Statistical inferences
are based on Fourier transforms of the data, which are unequally
spaced. The the material is motivated by motivating quotes from
the writings of John Tukey. The work is joint with researchers
at the US Forest Service.

Sir David R. Cox
Oxford, Nuffield College and Department of Statistics
Graphical models for the interpretation of data: some recent
developments
Graphical representations of statistical relationships are generalizations
of Sewall Wright's path analysis. They are used in a number of contexts;
in this paper the emphasis is on the analysis of empirical data.
Several different types of graph are needed. The relation between
graphical representations and matrices is used to show the consequences
of certain manipulations of the graphs. The relation with discussions
of statistical causality is outlined.

Augustine Kong
deCode Genetics
A High Resolution Recombination Map of the Human Genome
Recombination is a mechanism of DNA mixing from one generation to
the next and serves an important role in human evolution. Results
based on new data, six times the sample size previously available,
will be presented. Emphasis will be on the interpretation and implications
of the data, although we will also touch on various statistical
challenges.

Michael A. Newton
University of WisconsinMadison
A statistical approach to modeling genomic aberrations
in cancer cells
I will discuss a modeling strategy for genomicaberration data
which allows us to to infer combinations of aberrations that together
increase the chance that a precancerous cell will have a descendant
tumor lineage. The likelihood component involves a network of
pathway structures and MCMC is used to sample from the space of
these oncogenic networks. I illustrate the methodology with chromosomebased
comparative genomic hybridizations from several recent studies,
and I will draw some comparisons with the oncogenictree methods
of R. Desper and colleagues.

Daryl Pregibon, withCorinna Cortes & Chris Volinsky
AT&T Shannon Labs
Graph Mining: Discovery in Large Networks
Large financial and telecommunication networks provide a rich
source of problems for the data mining community. The problems
are inherently quite distinct from traditional data mining in
that the data records, representing transactions between pairs
of entities, are not independent. Indeed, it is often the linkages
between entities that are of primary interest. A second factor,
network dynamics, induces further challenges as new nodes and
edges are introduced through time while old edges and nodes disappear.
We discuss our approach to representing and mining large sparse
graphs. Several applications in telecommunications fraud detection
are used to illustrate the benefits our approach.

James Robins
Harvard University
Optimal Treatment Regimes
We discuss a new approach to estimation of the optimal treatment
regime or strategy from longitudinal observational data. This approach
is based on so called Gestimation of optimal regime structural
nested mean models. It ia an extension of the novel approach recently
developed by Susan Murphy.

Elizabeth Thompson
Washington University
Monte Carlo estimation of multipoint linkage lod scores Elizabeth
Thompson, University of Washington
The computation of multipoint linkage loglikelihoods is an important
tool in localizing genes for human traits, particularly on extended
pedigrees where observations may be sparse. Many real data analysis
problems are beyond the scope of exact likelihood computation, and
Markov chain Monte Carlo (MCMC) provides an alternative approach.
In any MCMC estimation procedure there are two major issues: the
mixing properties of the MCMC samplers, and the Monte Carlo variance
of estimators. To improve the sampling process we have adopted a
variety of tools from MCMC methodology, including blockGibbs updates,
integrated proposals, and MetropolisHastings restarts using sequential
imputation proposals. Additionally, we use a pseudoBayesian approach
retrieving a likelihood estimate from realizations from a posterior
distribution. Our estimators use RaoBlackwellized versions of the
usual simple count estimators. Together, these tools provide accurate
and effective computation, as illustrated by several realdata examples.
This research is joint work with Andrew George.

Rob Tibshirani
Stanford University
Least angle regression, forward stagewise and the lasso
We discuss "Least Angle Regression" ("LARS"),
a new model selection algorithm. This is a useful and less greedy
version of
traditional forward selection methods. Three main properties of
LARS are derived. (1) A simple modification of the LARS algorithm
implements the Lasso, an attractive version of Ordinary Least
Squares that constrains the sum of the absolute regression coefficients;
the LARS modification calculates all possible Lasso estimates
for a given problem in an order of magnitude less computer time
than previous methods. (2) A different LARS modification efficiently
implements Forward Stagewise linear regression, another promising
new model selection method; this connection explains the similar
numerical results previously observed for the Lasso and Stagewise,
and helps understand the properties of both methods, which are
seen as constrained versions of the simpler LARS algorithm. (3)
A simple approximation for the degrees of freedom of a LARS estimate
is available, from which we derive a Cp estimate of prediction
error; this allows a principled choice among the range of possible
LARS estimates.
LARS and its variants are computationally efficient: the paper
describes a publicly available algorithm that requires only the
same order of magnitude of computational effort as Ordinary Least
Squares applied to the full set of covariates.
This is joint work with Bradley Efron, Trevor Hastie and Iain
Johnstone

Back to Index

