October 5 -
Statistical Learning with Large Numbers of Predictor Variables
Many present day applications of statistical learning involve large
numbers of predictor variables. Ofter that number is much larger than
the number of cases or observations available to train the learning
algorithm. In such situations traditional methods fail. Recently new
techniques based on regularization have been developed that can often
produce accurate learning models in these settings. This talk will
describe the basic principles underlying the method of regularizationand
then focus on those methods exploiting the sparsity of the predicting
model. The potential merits of these methods are then explored by
October 6 - 11:00 a.m.
Predictive Learning via Rule Ensembles
General regression and classification models are constructed as
linear combinations of simple rules derived from the data. Each
rule consists of a conjunction of a small number of simple statements
concerning the values of individual input variables. These rule
ensembles are shown to produce predictive accuracy comparable to
the best methods. However their principal advantage lies in interpretation.
Because of its simple form, each rule is easy to understand, as
is its influence on individual predictions, selected subsets of
predictions, or globally over the entire space of joint input variable
values. Similarly, the
degree of relevance of the respective input variables can be assessed
globally, locally in different regions of the input space, or at
individual prediction points. Techniques are presented for automatically
identifying those variables that are involved in interactions with
other variables, the strength and degree of those interactions,
as well as the identities of the other variables with which they
interact. Graphical representations are used to visualize both main
and interaction effects.
Dr. Friedman is one of the world's leading researchers in statistics
and data mining. He has been a Professor of Statistics at Stanford
University for nearly 20 years and has published on a wide range
of data-mining topics including nearest neighbor classification,
logistical regressions, and high dimensional data analysis. His
primary research interest is in the area of machine learning.
The Distinguished Lecture Series in Statistical Science series was
established in 2000 and takes place annually. It consists of two lectures
by a prominent statistical scientist. The first lecture is intended
for a broad mathematical sciences audience. The series occasionally
takes place at a member university and is tied to any current thematic
program related to statistical science; in the absence of such a program
the speaker is chosen independently of current activity at the Institute.
A nominating committee of representatives from the member universities
solicits nominations from the Canadian statistical community and makes
a recommendation to the Fields Scientific Advisory Panel, which is
responsible for the selection of speakers.
Lecture Series in Statistical Science Index