Many present day applications of statistical learning involve large numbers of predictor variables. Ofter that number is much larger than the number of cases or observations available to train the learning algorithm. In such situations traditional methods fail. Recently new techniques based on regularization have been developed that can often produce accurate learning models in these settings. This talk will describe the basic principles underlying the method of regularizationand then focus on those methods exploiting the sparsity of the predicting model. The potential merits of these methods are then explored by example.

October 6 - 11:00 a.m.

Predictive Learning via Rule Ensembles

General regression and classification models are constructed as linear combinations of simple rules derived from the data. Each rule consists of a conjunction of a small number of simple statements concerning the values of individual input variables. These rule ensembles are shown to produce predictive accuracy comparable to the best methods. However their principal advantage lies in interpretation. Because of its simple form, each rule is easy to understand, as is its influence on individual predictions, selected subsets of predictions, or globally over the entire space of joint input variable values. Similarly, the
degree of relevance of the respective input variables can be assessed globally, locally in different regions of the input space, or at individual prediction points. Techniques are presented for automatically identifying those variables that are involved in interactions with other variables, the strength and degree of those interactions, as well as the identities of the other variables with which they interact. Graphical representations are used to visualize both main and interaction effects.
-----------------------------------------

Dr. Friedman is one of the world's leading researchers in statistics and data mining. He has been a Professor of Statistics at Stanford University for nearly 20 years and has published on a wide range of data-mining topics including nearest neighbor classification, logistical regressions, and high dimensional data analysis. His primary research interest is in the area of machine learning.

The Distinguished Lecture Series in Statistical Science series was established in 2000 and takes place annually. It consists of two lectures by a prominent statistical scientist. The first lecture is intended for a broad mathematical sciences audience. The series occasionally takes place at a member university and is tied to any current thematic program related to statistical science; in the absence of such a program the speaker is chosen independently of current activity at the Institute. A nominating committee of representatives from the member universities solicits nominations from the Canadian statistical community and makes a recommendation to the Fields Scientific Advisory Panel, which is responsible for the selection of speakers.

Distinguished Lecture Series in Statistical Science Index

DISTINGUISHED LECTURE SERIES

Distinguished Lecture Series in Statistical Science October 5-6, 2011 Jerome H. Friedman Department of Statistics, Stanford University

October 5 - 3:30 p.m. October 6 - 11:00 a.m.

Room 230, Fields Institute (map to Fields)

October 5 - 3:30 p.m.

Statistical Learning with Large Numbers of Predictor Variables

October 6 - 11:00 a.m. Predictive Learning via Rule Ensembles

Distinguished Lecture Series in Statistical Science

October 5-6, 2011
Jerome H. Friedman
Department of Statistics, Stanford University

October 5 - 3:30 p.m.
October 6 - 11:00 a.m.

Room 230,
Fields Institute (map to Fields)

October 6 - 11:00 a.m.

Predictive Learning via Rule Ensembles