May 27, 2024

Distinguished Lecture Series in Statistical Science
September 28 and October 26, 2000

Peter G. Hall

Australian National University
1. Data Tuning
Until recently, altering one's data was sacrilegious. A major problem was that we didn't know how to do it objectively. Altering the data according to objective criteria turns out to be a surprisingly computer-intensive business, and in many instances wouldn't have been feasible a decade or two ago. Today, however, thanks to the ready availability of computing power, we can do all sorts of complex things to the data. Data-tuning methods alter the data so as to enhance performance of a relatively elementary technique. The idea is to retain the advantageous features of the simpler method, and at the same time improve its performance in specific ways. Different approaches to data tuning include physically altering the data (data sharpening), reweighting or tilting the data (the biased bootstrap), adding extra "pseudo data" derived from the original data, or a combination of all three. Tilting methods date back to the 1950's, although only recently have they become popular. Evidence is growing, however, that sharpening is more effective than tilting, since it doesn't reduce effective sample size.
2. Estimating Fault Lines and Boundaries
A fault line in a regression model with bivariate design, Zi = f(Xi,Yi) + error, is a curve in the (x,y)-plane along which the function z = f(x,y) has a fault-type jump discontinuity. Such problems arise, for example, in the measurement of benthic impacts or the estimation of lines along which sea-surface temperatures change. The fault is not necessarily the result of simple 'slippage', and in particular gradients do not necessarily match at the top and bottom of the fault. We shall describe methodology for both point and interval estimation of fault lines, and for related problems such as estimation of fault lines in density or intensity surfaces, or estimation of support boundaries.