November 27, 2014


Thematic Program on Statistical Inference, Learning, and Models for Big Data
January to June, 2015
Organizing Committee
Nancy Reid (Toronto)
Yoshua Bengio (Montréal)
Hugh Chipman (Acadia)
Sallie Keller (Virginia Tech)
Lisa Lix (Manitoba)
Richard Lockhart (Simon Fraser)
Ruslan Salakhutdinov (Toronto)
International Advisory Committee
Constantine Gatsonis (Brown)
Susan Holmes (Stanford)
Snehelata Huzurbazar (Wyoming)
Nicolai Meinshausen (ETH Zurich)
Dale Schuurmans (Alberta)
Robert Tibshirani (Stanford)
Bin Yu (UC Berkeley)


This thematic program emphasizes both applied and theoretical aspects of statistical inference, learning and models in big data. The opening conference will serve as an introduction to the program, concentrating on overview lectures and background preparation. Workshops throughout the year will emphasize deep learning, statistical learning, visualization, networks, health and social policy, and physical sciences. A number of allied activities at PIMS, CRM and AARMS are also planned, and listed at the bottom of this page. This thematic program is taking place with the cooperation of the new Canadian Statistical Sciences Institute (CANSSI).
It is expected that all activities will be webcast using the FieldsLive system to permit wide participation.

Conferences and Workshops


Graduate Course 1:.Large Scale Machine Learning
Monday, 11 a.m. -2 p.m, January 5 to March 31 ( no classes Feb 16-20), Stewart Library, Fields Institute
Instructor: Russ Salakhutdinov, Departments of Computer Science and Statistical Sciences, University of Toronto

    Description: Statistical machine learning is a very dynamic field that lies at the intersection of statistics and computational sciences. The goal of statistical machine learning is to develop algorithms that can "learn" from data using statistical and computational methods. Over the last decade, driven by rapid advances in numerous fields, such as computational biology, neuroscience, data mining, signal processing, and finance, applications that involve large amounts of high-dimensional data are not that uncommon.
    The goal of this course is to introduce core concepts of large-scale machine learning and discuss scalable techniques for analyzing large amounts of data. Both theoretical and practical aspects will be discussed.

Graduate Course 2: Topics in Inference for Big Data
Friday, 1 p.m. -4 p.m, January 5 to March 31 ( no classes Feb 16-20), Stewart Library, Fields Institute
Instructors: Nancy Reid, Department of Statistical Sciences, University of Toronto; Mu Zhu, Department of Statistics and Actuarial Science, University of Waterloo

    Description: This course will introduce students to the topics under discussion during the thematic program on Statistical Inference in Big Data, with a mix of background lectures and guest lectures. The goal is to prepare students, postdoctoral fellows, and other interested participants to benefit from upcoming workshops in the thematic program, and to provide a venue for further discussion of keynote presentations after the workshops.

These courses will be streamed using FieldsLive, and students are welcome to attend online. Students interested in obtaining credit for these courses need to arrange with their home department to have them approved as reading or research courses. We will make available the timetable and requirements for the course at the first lecture in January, 2015.

Postdoctoral fellowships

A limited number of postdoctoral fellowships are available; please see the Fields web page for the advertisement. Applications were due by June 1, 2014 but late applications will also be accepted until the positions are filled. There are opportunities for extended visits of senior (all but degree) graduate students. Please apply through the Application for Participant Support link.

Allied Activity

Special Lectures
April 9-10, 2015 Fields Institute Distinguished Lecture Series in Statistics

Terry Speed
(University of California, Berkeley)

    July 21 – August 15, 2014
    Summer School: Statistical Learning in Big Data
    Instructors: Hugh Chipman, Acadia; Sunny Wang, St. Francis Xavier
    held at AARMS

    December 10 & 11, 2014
    Distinguished Lecture Series in Statistical Science
    Bin Yu, University of California, Berkeley
    Room 230, Fields Institute

    April 20 - 24, 2015
    CANSSI Workshop on Complex spatio-temporal data structures:
    Methods and applications

    held at Fields Institute

    April 29-30, 2015
    General Scientific Activity: Big Data in Commercial and Retail Banking (1 day)
    with Mark Reesor, (Western); Matt Davison, (Western); Adam Metzler, (Wilfrid Laurier )
    held at Fields Institutue

    April 20 – 24, 2015
    Workshop Statistical Theory for Large-Scale Data
    with Richard Lockhart (Chair), Nicolai Meinshausen
    held at PIMS, University of British Columbia

    May 4 – 8, 2015
    Workshop and Short Course on Statistical and computational challenges in networks, web mining and cybersecurity:
    with Hugh Chipman (Chair), François Théberge (U Ottawa)
    held at CRM, Montreal

    May 11–15, 2015
    Workshop on Challenges in Environmental Science
    with Richard Lockhart (Chair), James Zidek (UBC)
    held at PIMS, University of British Columbia

    August, 2015
    Workshop on Deep Learning
    Organizing Committee: Yoshua Bengio, Chair
    held at CRM, Montreal

    Back to top

Workshop Overviews

Preliminary descriptions of the workshops and conferences, from the program proposal.

  • Opening Conference and Boot camp

    Program: The goals are to to prepare students, postdoctoral fellows, and interested researchers to benefit from the activities to follow, and to build momentum and generate widespread interest in the thematic program. General lectures on days 1 and 2, and overview lectures on the main themes of the program on the remaining days. Confirmed speakers to date include: Nancy Reid, Hugh Chipman, Michael Jordan, Steve Scott, Jenny Bryan, Robert Bell, Steve Scott, Mark Girolami, Han Liu, Jonathan Taylor, Richard Lockhart, Charmaine Dean, Alexandra Schmidt, Bo Li, Martin Wainwright, Anima Anundkumar, Stephen Vavasis, Michael Batty, Chad Gaffield, Sallie Keller, Shane Reese, David Buckeridge, Lisa Lix, Russ Salakhutdinov, Eric Kolaczyk, Patrick Wolfe, Sofia Ohlede.
    The tentative timetable is

  • Jan 12-13: Introductory Lectures and Overview
    Jan 14: Inference
    Jan 15: Environmental Science
    Jan 16: Optimization
    Jan 19: Visualization
    Jan 20: Social Policy
    Jan 21: Health Policy
    Jan 22: Deep Learning
    Jan 23: Networks and Machine Learning
  • Big Data and Statistical Machine Learning

    Program: The aim of this workshop is to bring together researchers working on various large-scale deep learning as well as hierarchical models to discuss a number of important challenges, including the ability to perform transfer learning as well as the best strategies to learn these systems on large scale problems. These problems are "large" in terms of input dimensionality (in the order of millions), number of training samples (in the order of 100 millions or more) and number of categories (in the order of several tens of thousands).

  • Optimization and Matrix Methods in Big Data

    Program: Day 1: Overview lectures by Vavasis, Anandkumar, Drineas, Friedlander, Wainwright. Days 2-5: Keynote lecture each morning; three or four research lectures; presentations by PDFs and student

  • Visualization for Big Data: Strategies and Principles

    Program: Day 1: Overview lectures, Days 2: One hour keynote lecture in the morning; followed by 2 forty-minute lectures; 3 forty-minute lectures in the afternoon; scheduled discussion Each day emphasizing one of the themes: data representation; data exploration via filtering, sampling and aggregation; principles of design; visualization and cognition

  • Big Data in Health Policy

    Program: The program will be organized around the main theme of causal inference in health policy. Inferring cause and effect relationships between disease exposures and health outcomes is of central importance in many real-world health policy problems involving big datasets, such as adverse medication effects, environmental exposures and cancer incidence, and long-term health outcomes in chronic disease populations. Randomized experiments are expensive, time-consuming, prone to subject selection biases, and often unethical. Hence, it is desirable to infer causal relationships using observational data arising from electronic databases. Causal inference topics that will be addressed through presentations, panel discussions, and small-group sessions include: graphical techniques for causal modeling; analytic techniques, including matching, propensity score and instrumental variable models; latent variable models; causal inference in longitudinal data.

    Secondary themes that will be addressed throughout the workshop include methods for linking large databases, data extraction techniques for clinical, genetics, and diagnostic imaging data, and data quality evaluation. These secondary themes have been selected because they often have a large impact on the ability to test causal hypotheses in large health databases.

  • Big Data for Social Policy

    The workshop will be organized around the following key topics, all focused on social problems and policy issues: urban analytics; privacy; official statistics; agent-based modeling and network models.

  • Statistical and computational challenges in networks, web mining and cybersecurity

    Program: 4.5 days consisting of 2 day short course + 2.5 days of research presentations. The short course, by Eric Kolaczyk (Boston University) on "Statistical Analysis of Network Data", will provide a primer for those less familiar with network data. Researchers from statistics, computer science and related areas will participate, giving a wide range of perspectives on current challenge areas in modelling related to networks.

  • Statistical theory for large-scale data

    Program: 5 days, with theory talks interspersed with domain of discourse talks. Speakers will be encouraged to pose open theoretical and methodological problems. There will be lots of time for discussion.

  • Challenges in environmental science

    Program: 5 days. Alternating sessions on new methodology and environmental science challenges needing new methodology

  • Deep Learning

    Program: This workshop will invite paper submissions to be presented as oral or in poster format. Through invited talks, a panel discussion and presentations by the participants, the workshop will showcase the latest advances in deep learning and address questions that are at the centre of current deep learning research (what roles do stochasticity/unsupervised learning/optimization play in deep learning, what are the desiderata for models of images/text/speech, etc.). Panel discussions will be led by the members of the organizing committee as well as by prominent representatives of the machine learning, computer vision and natural language processing communities.

  • Closing Conference

    We propose to hold this at the AARMS of Dalhousie University, in conjunction with the Annual Meeting of the Canadian Statistical Sciences Institute, in the two days preceding the Annual Meeting of the Statistical Society of Canada, which in 2015 will be in Halifax, June 15 -- 18. Overview lectures by members of the organizing committee will highlight the research generated by the thematic program.

Back to top