February 27, 2015

April 9 (3:30 p.m.) & April 10 (11:00 a.m.), 2015
Distinguished Lecture Series in Statistical Science

Room 230, Fields Institute

Terry Speed
University of California, Berkeley

Terry Speed's research concerns the application of statistics to problems in genetics and molecular biology. These have provided many novel challenges of both an applied and a theoretical nature. His major interests within this area are in the mapping of genes in mice and humans, including disease genes and genes contributing to the variation of quantitative traits. The Human Genome Project was a stimulus for a number of the problems he has investigated with his students. Other areas of interest include the analysis of DNA and protein sequences, for example, finding genes or motifs in DNA sequence, and the analysis of microarray data. He is currently on the editorial board of the Journal of Computational Biology, JASA, Bernoulli and the Australian and New Zealand Journal of Statistics.

Specialized lecture: Normalization of omic data after 2007
(joint with Johann Gagnon-Bartsch and Laurent Jacob)

Abstract: For over a decade now, normalization of transcriptomic, genomic and more recently metabolomic and proteomic data has been something you do to "raw" data to remove biases, technical artifacts and other systematic non-biological features. These features could be due to sample preparation and storage, reagents, equipment, people and so on. It was a "one-off" fix to what I'm going to call removing unwanted variation. Since around 2007, a more nuanced approach has been available, due to JT Leek and J Storey (SVA) and O Stegle et al (PEER). These new approaches do two things differently. The first is that they do not assume the sources of unwanted variation are known in advance, they are inferred from the data. And secondly, they deal with the unwanted variation in a model-based way, not "up front." That is, they do it in a problem-specific manner, where different inference problems warrant different model-based solutions. For example, the solution for removing unwanted variation in estimation not necessarily being the same as doing for prediction. Over the last few years, I have been working with Johann Gagnon-Bartsch and Laurent Jacob on these same problems through making use of positive and negative controls, a strategy which we think has some advantages. In this talk I'll review the area, and highlight some of the advantages of working with controls. Illustrations will be from microarray, mass spec and RNA-seq data.

General lecture: Epigenetics: A New Frontier

Abstract: Scientists have now mapped the human genome - the next frontier is understanding human epigenomes; the 'instructions' which tell the DNA whether to make skin cells or blood cells or other body parts. Apart from a few exceptions, the DNA sequence of an organism is the same whatever cell is considered. So why are the blood, nerve, skin and muscle cells so different and what mechanism is employed to create this difference? The answer lies in epigenetics. If we compare the genome sequence to text, the epigenome is the punctuation and shows how the DNA should be read. Advances in DNA sequencing in the last 5-8 years have allowed large amounts of DNA sequence data to be compiled. For every single reference human genome, there will be literally hundreds of reference epigenomes, and their analysis will occupy biologists, bioinformaticians and biostatisticians for some time to come. In this talk I will introduce the topic and the data, and outline some of the challenges.

April 23 (3:30 p.m.) & 24 (11:00 a.m.), 2015
Distinguished Lecture Series in Statistical Science

Room 230, Fields Institute

Bin Yu
University of California, Berkeley

Bin Yu is Chancellor’s Professor in the Departments of Statistics and of Electrical Engineering & Computer Science at the University of California at Berkeley. She was an assistant professor at UW-Madison, a visiting assistant professor at Yale University while on leave from Madison, and a Member of Technical Staff at Lucent Bell Labs. She was Chair of Department of Statistics at Berkeley from 2009 to 2012, and is a founding co-director of the Microsoft Joint Lab on Statistics and Information Technology at Peking University where she is also Chair of the scientific advisory committee of the Center for Statistical Sciences. She has published over 80 scientific papers in premier journals in statistics, machine learning, information theory, signal processing, remote sensing, neuroscience, network analysis, and bioinformatics.

She is a Member of the U.S. National Academy of Sciences, and a Fellow of the American Academy of Arts and Sciences. She was a Guggenheim Fellow in 2006, and Tukey Memorial Lecturer in 2011 of the Bernoulli Society. She was President of IMS (Institute of Mathematical Statistics) from 2013-14 and now Past President of IMS. She is a Fellow of AAAS, IEEE, IMS, and ASA. She has served or is serving on many editorial boards including Journal of Machine Learning Research (JMLR), Annals of Statistics (AoS), and Journal of American Statistical Association (JASA). She served on the Scientific Advisory Board (SAB) of IPAM at UCLA and is serving on the Board of Trustees (BOT) of ICERM at Brown University. She was co-chair of the National Scientific Committee of SAMSI, and served on the Board of Mathematical Sciences and Applications (BMSA) of the U.S. National Academy of Sciences.

Distinguished Lecture Series in Statistical Science Index