Fields Academy Shared Graduate Course: Statistical Methods for Big Genomic Data
Description
Instructor: Prof. Sanjeena Dang
Email:
Registration Deadline: September 14th, 2022
Lecture Times: Tuesday and Thursday | 2:35 - 3:55 PM (ET)
Office Hours: Friday | 10:00 - 11:00 AM (ET)
Course Dates: September 8th - December 1st, 2022
Mid-Semester Break: October 24th - 28th, 2022
Registration Fee: PSU Students - Free | Other Students - $500 CAD
Prerequisites: This course is intended for statistics graduate students. Knowledge of the following statistical concepts is expected: parameter estimation, hypothesis testing, confidence intervals, linear regression, and ANOVA.
Evaluation:
Assignments - 3 in total (15% each)
Project Proposal - 5%
Final Project - 50%
Capacity Limit: 50
Format: Hybrid.
Lectures will be held in person at Carleton University, which will also be broadcasted via Zoom for remote participation.
Course Description
The course will provide an understanding of the foundations of the statistical and computational tools that are routinely used to analyze large scale genomics data. Building on a basic prior understanding of R, the student will learn data preprocessing and visualization, basics of statistical inference and modelling, and gain an understanding of some advance topics in statistical/machine learning and bioinformatics. Using omics datasets, students will learn the fundamentals of statistical techniques and their implementation in R/Bioconductor, develop skills to critically evaluate and use appropriate methodology, interpretations of the results, drawing conclusions, and making inferences. Some topics on efficient computing will also be explored.
Lecture notes and journal publications on relevant topics will provide the main content for the course. Certain topics from the following textbooks will be adapted in the course:
- R for Data Science by Wickham and Grolemund, 1st edition. The book is freely available through the authors’ website at: http://r4ds.had.co.nz.
- An Introduction to Statistical Learning with Applications in R by James, Witten, Hastie, and Tibshirani, 1st edition, 2013/ The book is freely available from the authors’ websites: http://www-bcf.usc.edu/~gareth/ISL/
- Statistical Analysis of Next Generation Sequencing Data by Somnath Datta and Nettleton Dan.
- Modern Statistics for Modern Biology by Susan Holmes and Wolfgang Huber. Online edition is available from the authors website: https://www.huber.embl.de/msmb/
Learning Management System (LMS): Fields Academy's CANVAS will be used for content delivery management, communications, grade sheets, assignments, etc.