
DISTINGUISHED LECTURE SERIES 

April 19, 2015  
December 6, 2012 at 3:30 p.m December 7, 2012 at 10:00 a.m Inference on Hazard Ratios and Survival Probabilities from Twophase Stratified Samples Norman E. Breslow (Department of Biostatistics, University of Washington, Seattle) Thomas Lumley (Department of Statistics, University of Auckland, NZ) Jon A. Wellner (Department of Biostatistics; Department of Statistics, University of Washington, Seattle) Epidemiologists employ stratied casecontrol studies nested within a dened cohort so that collection of costly covariate information, such as bioassays of stored tissue samples, may be limited to the most informative participants. These designs involve twophase stratified samples: a simple random sample (the main cohort) from an innite superpopulation (model) at Phase I; and a nite population stratified (casecontrol) sample at Phase II. One approach to analysis involves inverse probability weighting (IPW) of general estimating equations. In previous work we investigated IPW of innite dimensional likelihood equations for both Euclidean and nonEuclidean parameters in semiparametric models, of which the paradigm is the Cox model for survival data. The key idea was to separate the likelihood calculations, which are the same as those for simple random sampling, from weak convergence results for the IPW empirical process. For estimation of the Euclidean parameter (log hazard ratios), the problem was asymptotically equivalent to that of using the Phase II sample to estimate an unknown nite population total: the total of the unknown influence function contributions for subjects in the main cohort. Efficiency was improved, sometimes dramatically, through adjustment of the sampling weights by calibration to totals of auxiliary variables known for everyone or by estimation of the known weights using these same variables. After reviewing these results, this talk considers the extensions needed for joint estimation of hazard ratios and baseline hazard function in the Cox model, and hence for prediction of survival probabilities. The improvements in prediction possible with calibrated or estimated weights are illustrated via simulations conducted using Lumleys R survey package to analyze data from the National Wilms Tumor Study. Presentation References: Breslow NE, Wellner JA. Scand J Stat 34:86102, 2007; 35:186192, 2008. Breslow NE, Lumley T et al. Am J Epidemiol 35:13981405, 2009 Breslow NE, Lumley T et al. Stat Biosci 1:3249, 2009 Breslow NE, Lumley T IMS Monograph Series { Wellner Festschrift, in press Lumley T. Complex Surveys, New York: Wiley, 2010 The Distinguished Lecture Series in Statistical Science series was established in 2000 and takes place annually. It consists of two lectures by a prominent statistical scientist. The first lecture is intended for a broad mathematical sciences audience. The series occasionally takes place at a member university and is tied to any current thematic program related to statistical science; in the absence of such a program the speaker is chosen independently of current activity at the Institute. A nominating committee of representatives from the member universities solicits nominations from the Canadian statistical community and makes a recommendation to the Fields Scientific Advisory Panel, which is responsible for the selection of speakers. Distinguished Lecture Series in Statistical Science Index

