2018 Fields Undergraduate Summer Research Program
July 3 to August 31, 2018
Description
***DEADLINE FOR STUDENT APPLICATION HAS NOW PASSED***
The Fields Undergraduate Summer Research Program (FUSRP) welcomes carefully selected undergraduate students from around the world for a rich mathematical research experience in July and August.
This year 43 students were selected from hundreds of applicants from mathematicsrelated disciplines to participate in 13 projects selected for the program.
This competitive initiative matches a group of up to five excellent students with faculty from Fields Principal Sponsoring or Affiliate Universities, visiting scientists, or researchers in industry.
Students accepted for the program will have most of their travel and onsite expenses covered by the Institute. Most of the program's funding supports student expenses and all student placements are based at Fields.
Goal
To provide a highquality and enriching mathematics research experience for undergraduates.
The project experience, quality mentorship, and team/independent work are intended to foster enthusiasm for continued research. Students work closely with each other and with their supervisor in a collaborative research team.
FUSRP in the News
The 2017 FUSRP was featured in the September 22, 2017 issue of the Globe and Mail. The article, written by Ivan Semeniuk, describes the program’s focus on teamwork and realworld problems.
Read the full article here.
Sponsors
2018 Students
Name (Last, First)  Home Institution  Country (Institution)  Nationality  Project # 
Alix, Gian Carlo  York University  Canada  Philipines  1 
Ang, Yan Sheng  National University of Singapore  Singapore  Singapore  10 
Bianco Prado, Bernardo  University of Minnesota  U.S.A.  Brazil  14 
Bryenton, Nicolas  University of Toronto  Canada  Canada  13 
Carden, Macdonald  Cornell University  U.S.A.  U.S.A  4 
Chan, Hymn  Imperial College London  United Kingdom  Canada  8 
Cheng, Frances  University of British Columbia  Canada  Canadian  9 
Eckels, Emily  Oxford College of Emory University  U.S.A.  U.S.A.  5 
González Sellán, Silvia  University of Oviedo  Spain  Spain  10 
Han, Xiayimei  Vanderbilt University  U.S.A.  China  14 
Holmes, Emma  McMaster University  Canada  Canada  5 
Hou, Zhaoran  University of Toronto  Canada  Canada  12 
Hu, Kevin  New York University  U.S.A.  U.S.A.  2 
Jin, Ende  University of Toronto  Canada  China  6 
Kazdan, Joshua  Stanford University  U.S.A  U.S.A  7 
Kesten, Jacob  Rice University  U.S.A.  U.S.A.  10 
Kirillov, Ilia  Moscow State University  Russia  Russia  2 
Kroell, Larissa  University of Innsbruck  Austria  Austria  7 
Lau, Michelle  Imperial College London  United Kingdom  Canada  5 
Leitch, Heather  Queen's University Belfast  United Kingdom  United Kingdom  4 
Li, Yufeng  McMaster University  Canada  China  1 
Li, Yunjing  University of Toronto  Canada  China  9 
Li, Tangling  University of Toronto  Canada  China  12 
Li, Jiayu  University of Toronto  Canada  China  14 
Liao, Lauren  University of California, San Diego  U.S.A.  U.S.A.  2 
Lin, Feiyang  Harvey Mudd College  U.S.A.  China  6 
Lin, Faner  University of Rochester  U.S.A.  China  12 
Martinez Alberga, Sofia  University of California, Riverside  U.S.A.  U.S.A.  7 
Melnyk, Oleksii  University of Oxford  United Kingdom  Ukraine  7 
Montes De Oca Osornio, Rodolfo  University of Guanajuato  Mexico  Mexico  8 
O'gorman, Ronan  Trinity College Dublin  Canada  Ireland  8 
Ouyang, Jing  Hong Kong University of Science and Technology  Hong Kong  China  11 
Qaisar, Waleed  University of Toronto  Canada  Canada  8 
Ragland, Colton  Ragland, Colton  U.S.A.  U.S.A.  4 
Shi, Xiaoying  University of California, Los Angeles  U.S.A.  China  11 
Tan, Johnson  McMaster University  Canada  Canada  10 
Tang, Tianchen  Hong Kong University of Science and Technology  Hong Kong  China  6 
Tenenbaum, Alexander  University of Toronto  Canada  Canada  7 
Tran, Huy  Harvard University  Canada  Vietnam  8 
Tupker, Quinten  University of Cambridge  United Kingdom  Canada  11 
Wang, Mingxuan  Texas A&M University  U.S.A.  China  1 
Wei, Zeyu  University of Wisconsin  Madison  U.S.A.  China  9 
Xia, Hedi  University of California, Santa Barbara  U.S.A.  China  2 
Supervisors Names and Projects
Name of Lead Supervisor (Last, First)  Affiliation  Project Number and Title  Student Names 
Chignell, Mark  University of Toronto  Project 9: Monte Carlo Simulation of the Impact of Distributional Properties on the Effectiveness of Cluster Boosted Regression 
Cheng, Chun Fang; Li , Yunjing; Wei, Zeyu 
Georgiou, Konstantinos  Ryerson University  Project 13: The Symmetric Rendezvous Problem on a Triangle 
Bryenton, Nicolas 
Hamel, Angele  Wilfred Laurier University  Project 7: Graphs, Chromatic Symmetric Functions, and Positivity 
Kazdan, Joshua; Kroell, Larissa; Martinez Alberga, Sofia; Melnyk , Oleksii; Tenenbaum, Alexander 
Hart, Bradd  McMaster University  Project 8: Model Theory and Free Probability 
Chan, Hymn; Montes De Oca Osornio, Rodolfo; O'gorman, Ronan; Qaisar, Waleed; Tran, Huy; Tan, Johnson

Jacobson, Alec  University of Toronto  Project 10: Morphological Operations on Discrete Surfaces 
Ang, Yan Sheng; González Sellán, Silvia; Kesten, Jacob 
Lee, Annie EnShuin  VerticalScope Inc.  Project 1: Big Data Information Extraction: Generating Seeded Named Entity Dictionary 
Alix, Gian Carlo; Li, Yufeng; Wang, Mingxuan 
Lynch, Geoffrey  Oanda (Canada) Corporation  Project 12: Optimal Hedging Strategies and Client Classification 
Hou, Zhaoran; Li, Tangling; Lin, Faner 
Morris, Kristen  University of Waterloo  Project 4: Control and Estimation of Shallow Water Waves 
Carden, Macdonald; Leitch, Heather; Ragland, Colton 
Rein, Hanno  University of Toronto  Project 5: Development and Evaluation of a Hybrid Symplectic Integrator for Planetary Systems 
Eckels, Emily; Holmes, Emma; Lau, Michelle 
Rudie, Karen  Queen's University  Project 6: DiscreteEvent Systems Model of a System's Ability to Protect Secrets 
Jin, Ende; Lin, Feiyang; Tang, Tianchen 
Rusjan, Pablo  University of Toronto  Project 11: Noninvasive Quantification of Brain Positron Emission Tomography Radioligand Binding 
Ouyang , Jing; Shi, Xiaoying; Tupker, Quinten 
Silberman, Gabby  Cerebri  Project 14: Unsupervised Learning Used in Dynamic Customer Journey 
Han, Xiayimei; Li, Jiayu; Bianco Prado, Bernardo

Zhu, Hongmei  York University  Project 2: Biomedical Data Analysis in Transformed Spaces 
Hu, Kevin; Kirillov, Ilia; Liao, Lauren 
Applications
Supervisor/Project Submissions 
Closed 
Student Submissions 
Closed  Extended to January 22 
Important Dates
2017  
October 1 
Call for Supervisor/Project Submissions. See here for details. 
November 15  Supervisor/Project Submission Deadline. 
2017  
December 15 
Call for Student Applications. See here for details. Selected projects/supervisors are posted on the Fields website. 
January 22 (extended)  Student Application Deadline. 
February 115  Successful students are contacted and offered a placement in FUSRP. 
February 1528  Names of successful/accepted students are posted on the Fields website. 
MarchJune  Students make appropriate travel/visa arrangements. 
July 3  Program begins at the Fields Institute at 9:00 am. 
August 31  Program concludes at 5:00 pm. 
Program Schedule
Week #  Date(s)  Activities 
Week 1 
July 2 
Checkin at Woodsworth College Residence (for those staying at Woodsworth). 
Week 1 
July 3 
Program begins at the Fields Institute at 9:00 am. See 2018 Fields Undergraduate Summer Research Program Orientation and Welcome 
Week 1  July 5 
University of Toronto campus tour at 10am 
Week 1  July 46 
Students meet informally with their supervisor(s) and with other students in their group to work on their assigned research project. 
Weeks 2 and 3  July 913 and 1620  Students meet informally with their supervisor(s) and with other students in their group to work on their assigned research project. Site visit to the supervisor's host institution. 
Weeks 48  July 23August 24  Students meet informally with their supervisor(s) and with other students in their group to work on their assigned research project. 
Week 4  July 25  Full day group excursion (all students welcome) organized and sponsored by the Fields Institute to Niagara Falls, Canada. 
Week 4  July 27  2018 FUSRP Professional Development Workshop  Presentations 
Week 5  August 1  Midprogram presentations August 1st: Each project will give a 10minute presentation on the research done so far, and where they aim to be at the end of the program. 
Week 5  TBD  PSUs Fair: The students will hear from representatives of the Institute's Principal Sponsoring Universities graduate programs. 
Week 9  August 29  Miniconference: The results of all summer student projects must be summarized and presented to other supervisor/student teams. Supervisors (or a qualified substitute) are required to make themselves available for the Miniconference. 
Week 9  August 31  Program concludes on August 31 at 5:00 pm. Last day to checkout from Woodsworth College Residence. We hope you have a safe trip home! 
PostProgram  September 8  Scientific Report deadline. 
PostProgram  September 11  Student Feedback Form deadline. 
Research Projects
Project 1: Big Data Information Extraction: Generating Seeded Named Entity Dictionary
Supervisor: Annie EnShuin Lee, VerticalScope Inc.
Project Rationale and Objective
VerticalScope, Inc. owns and operates one of the most highly visited networks of online forums. These online discussion forums attract thousands of active discussions every day regarding products and services from a variety of businesses. Therefore, VerticalScope is in an unique position to glean insights from the vast amount of rich unstructured information generated by its forum users.
A core business initiative of VerticalScope is to identify names of products in these divergent verticals. Currently, the Data Science team at VerticalScope has supervised models for identifying automotive and powersports brands that is built from timeconsuming and expensive manual humanlabelled training sets. However, VerticalScope deals with many verticals which comes with different size, language, and content. Thus fitting a supervised model for each product/brand to each vertical can be very expensive.
Three major challenges faced by VerticalScope when identifying products are: 1) the high cost of generating training data for each type of product, 2) covering large divergent types of products from all verticals, and 3) disambiguating different types of entities based on context. Therefore, in the absence or lack of training data, alternative solution is semisupervised learning algorithms such as 1) seeding dictionary, 2) patternbased learning, and 3) patternbootstrapping with dictionary. The goal is to create a semisupervised machine learning algorithm for identifying product names.
Key Tasks Associated with the Project and Timeline (9 weeks)
 Week 1, 2 Training: Learn the patternbootstrapping theoretical framework and familiarize with running the Spark Scala code for experimentation
 Week 3, 4 Experiments Round 1: Establish baseline of methodology for Automotive Vertical and collect evaluation data for all other verticals
 Week 5, 6 Experiments Round 2: Create models for all other Verticals, collect training data for other verticals as necessary
 Week 7, 8 Final Model Evaluation and Refinement: Measure the performance of each model and finalize models for all verticals
 Week 9 Technology Transfer and Final Wrapup
Existing data or needed data collection efforts: There is groundtruth gold annotations and dictionary that is given from Automotive vertical. The student will require to collect data for other verticals
Student responsibilities
 Understand patternbootstrapping methodology for extracting dictionary and patterns
 Learn to run the existing Spark job on largescale dataset and modify scala code as necessary
 Train and create machine learning models for the finalized task
Expected outcomes by the conclusion of the program: A clean set of named entity seeds from different Verticals
Project 2: Biomedical Data Analysis in Transformed Spaces
Supervisor: Hongmei Zhu, York University
Project Description
As exemplified by the rapid development of digital technology in hospitals, extracting clinically relevant information from medical data to aid disease detection has become an important part of clinical routine. Being able to process and analyze big data efficiently is a challenge. The challenge in turn requires the innovative development of optimal data representation that can efficiently reveal the desirable characteristics embedded in data. One key approach in this area is via mathematical transforms, such as Fourier, wavelets or timefrequency analysis. Data can be transformed from measured space to another space where features can easily extracted. An optimal timefrequency representation is easy to interpret and excellent for data analysis and feature detection. The goal of this research project is to investigate various transform in the context of a specific biomedical application.
Student Responsibilities
 Research and understand the background materials on transforms, feature extraction, and neural network
 Understand the background of the clinical application
 Delivery presentations on progresses
 Program with Matlab and Python to carry out the tasks
 Write a final report
Project Phrases
 Weeks 12: Literature review and be familiar with the computing environment
 Weeks 35: Implement new algorithms to obtain different data representations and test the effectiveness for clinical important information
 Weeks 68: Further refine the selected methods and run the normal and diseased clinical data
 Week 9: Wrapup, a final report.
Project 3: Constructing Effective Smallscale Group Testing Schemes With Side Constraints
Cosupervisors: Kevin Cheung and Brett Stevens, Carleton University
Project Description
Group testing uses results in combinatorial designs to reduce the number of tests to perform to identify individual defects in large groups of items. Over the years, many constructions with good asymptotic bounds have been obtained with immediate applications in largescale problems such as software testing and the human genome project. In addition to the successes in largescale applications, group testing results also find their way in smallscale applications such as skills development and knowledge diagnostics. One challenge in finding efficient group tests for smallscale applications is that the asymptotic bounds are not applicable when the number of items is small. Therefore, a different kind of search strategy for efficient testing schemes is required. The focus of the proposed project is on constructing small group testing schemes. In particular, the work will involve adapting existing constructions or proposing new ones and implement them to generate concrete testing schemes that can be immediately applied in realworld situations. The end result is a library of such constructions that can be queried through a web interface.
The project will be in four phases:
1. Literature review and setup of computational environment (1  2 weeks)
2. Devising new methods of construction and testing their effectiveness (2  3 weeks)
3. Obtain efficient implementations for selected methods (2  3 weeks)
4. Set up database of constructions and web portal (1  2 weeks)
Selected references relevant to the project are:
 Chin, Francis Y. L. and Leung, Henry C. M. and Yiu, S. M. (2013): Nonadaptive complex group testing with multiple positive sets, Theory and Applications of Models of Computation, 505, 1118
 Porat, Ely and Rothschild, Amir (2011): Explicit nonadaptive combinatorial group testing schemes, IEEE Transactions on Information Theory, 57, 79827989.
 Indyk, Piotr and Ngo, Hung Q. and Rudra, Atri (2010): Efficiently decodable nonadaptive group testing, Proceedings of the Twentyfirst Annual ACMSIAM Symposium on Discrete Algorithms, 11261142.
 Ngo, Hung Q. and Porat, Ely and Rudra, Atri (2011): Efficiently decodable errorcorrecting list disjunct matrices and applications. In: Aceto L., Henzinger M., Sgall J. (eds) Automata, Languages and Programming. ICALP 2011. Lecture Notes in Computer Science, 6755, 557568.
Project 4: Control and Estimation of Shallow Water Waves
Supervisor: Kirsten Morris, University of Waterloo
Project Description
The dynamics of many systems are modelled by partial differential equations (PDEs). Examples include acoustic noise, building ventilation, and lithiumion cells. There are a number of theoretical and computational challenges in designing controllers and estimators for these systems. One issue is that in practice, controllers and estimators need to be calculated using an approximation to the PDE. The hope is that the controller/estimator will have the desired effect on the original system. However, sometimes neglected dynamics lead to substandard performance, or even instability when the algorithm is implemented. The issues for diffusion systems, which are exponentially stable, are wellunderstood and a number of approaches to computation are possible. However, many systems, such as those with convectiondiffusion and coupled fluidstructures, are only asymptotically stable. The fact that a linear system can be asymptotically stable but not exponentially stable is one of the points that distinguishes PDE control and estimation from that for ordinary differential equations. The weak dissipation in these systems introduces theoretical and computational issues. Furthermore, the models for some of these systems are partial differential algebraic equations (PDAEs). Even establishing wellposedness of these model is not straightforward.
The focus of the project is to study these issues in the context of a relatively simple example, linear shallow water waves in one space dimension with dissipation. A particular formulation of this model has eigenvalues asymptoting to the imaginary axis and so is not exponentially stable. It is not known however whether the model is asymptotically stable. Also, the question of obtaining suitable approximation methods is open. Numerical approximations for lightly damped systems are notorious for introducing spurious eigenvalues. While not an issue for simulation, this can degrade the behaviour of controllers and estimators designed using them and a different approach is needed.
The students will first establish wellposedness of the equation on a Hilbert space with norm and variables related to system energy. They will learn about PDAEs as part of this effort. They will then attempt to show that the model is asymptotically stable. The key tool will be the ArendtBatty Theorem. A final step will be to investigate numerical approximations. Recent work on discretization of portHamiltonian systems that preserve energy will be explored as part of this. If all this goes well, the students will design a controller and an estimator using an approximation and ascertain their performance with the original PDE. Another direction if the students make good progress is investigation of nonlinear shallow water waves.
Project 5: Development and Evaluation of a Hybrid Symplectic Integrator for Planetary Systems
Supervisor: Hanno Rein, University of Toronto
Project Description
Over 3000 new planetary systems have been discovered in the last 20 years. These systems exhibit complex dynamic behaviours, many of which remain poorly understood. However, understanding the current dynamical state of these systems is important as it allows us to put our own Solar System into context, finding similarities and differences between our home, the only one we know for sure hosts life, and these strange new worlds.
This project is about the methods used for numerical simulations of both exoplanetary systems and our own Solar System. Having fast and accurate numerical tools to perform such simulations is crucial if we want to understand the dynamical architecture of these systems. The equations of motions that govern the dynamical evolution are well know. However, due to the inherent sequential nature of the problem, it remains incredibly hard to solve. Even highly optimized algorithms on modern CPUs might need months to integrate the ordinary differential equations of an 8 planet system because they need up to 100 billion individual timesteps. My research group has made substantial progress in recent years by developing the world's most efficient symplectic ODE integrators (WHFast, JANUS, IAS15) for planetary Nbody simulations. However, one particular problem remains unsolved: adapting these symplectic ODE integrators so that they can efficiently integrate when collisions and close encounters occur between planets. Hybrid symplectic integrators that smoothly switch from one integration method to another are a promising solution.
In this project you will learn how geometric, and in particular symplectic integration methods work. In the first two weeks you will write your own symplectic mixed variable integrator for the Kepler problem and develop a testbench to measure its accuracy and speed. In weeks three and four, you will extend the integration algorithm to a hybrid scheme. You will be able to build on the progress that my group has already made by using our integrators as building blocks for your own hybrid scheme. You will develop a specific set of test problems to monitor the integrator's properties such as energy conservation and symplecticity in cases where close encounters happen. In the remaining weeks, you will try to optimize your integrator and merge it into the REBOUND integrator package. Your will develop a userfriendly interface to your algorithm and a set of examples that will allow other people to learn how to use your algorithm for astrophysical applications. If everything goes well, you will start to write up the results in the form of a paper to be submitted to a peer review journal by the end of summer.
Project 6: DiscreteEvent Systems Model of a System's Ability to Protect Secrets
Supervisor: Karen Rudie, Queen's University
Project Description
To mathematically model problems that arise in cyberphysical systems, we use finite automata, from theoretical computing. Automata are 5tuples that can be used to represent how processes moves from state to state upon the occurrence of events. These systems are called discreteevent systems (DES) and they are nearly identical to directed graphs. A sequence of events in the DES is called a string and is comparable to a path in a graph. We assume that not all events in a string are observable to an agent. We model an agent's observations with a mapping ϕ and then for a given string s generated by the system ϕ(s) would be the sequence of events that the agent observes.
Recently, researchers have been examining opacity, the ability of a system to prevent some set of strings (called secrets) from being distinguished from some other set of strings (nonsecrets). When a system is opaque then secrets cannot be distinguished from nonsecrets. Two strings being indistinguishable is like imagining a directed graph and asking if the edge labels that you can observe along one path from some vertex are the same as the ones you can observe along another path from the same vertex.
In this project we are interested in decentralized agents, each of whom has a potentially different observation mapping and we wish to determine a strategy for communicating event occurrences among the agents so that to some agent the system is nonopaque. The difficulty of the problem arises when one wants to find a minimal set of communications because what each agent observes (both directly and from communications sent to it) impacts what it can communicate to other agents. Moreover some DES communication problems are nonmonotonic so that finding a minimal solution is difficult: namely, it is not the case that the more an agent observes the more strings the agent can distinguish.
This problem can be used to model a group of hackers or invaders separated geographically (i.e., each with only partial observation) working in concert to steal private information (i.e., to render the system nonopaque).
In the first few weeks of the program, students will read about DES, opacity, and minimal communication problems. Then the students will start tackling the problem. Research in this area would likely appeal to students who enjoy discrete mathematics or graph theory.
Student Responsibilities:
 •Learn about opacity in discreteevent systems, monotonicity in the context of observations, and existing work on minimal communication among decentralized agents in discreteevent systems
 •Develop familiarity with proof methods in discreteevent systems theory
 •Develop algorithm(s) for communications between agents to ensure that to some agent the system is nonopaque
 •Try to get a minimalcommunication algorithm
Expected outcomes by the conclusion of the program: an algorithm for (minimal) communication to ensure opacity and a report on the work that can be submitted for publication in a peerreviewed journal.
Project 7: Graphs, Chromatic Symmetric Functions, and Positivity
Supervisor: Angele Hamel, Wilfred Laurier University
Project Description
Chromatic symmetric functions—the focus of this project in combinatorics—sit at the intersection of graph theory and enumeration. These symmetric functions, defined in 1995 by Richard Stanley of MIT, generalize chromatic polynomials, wellknown objects in graph theory that count the number of colorings of a particular graph. By contrast, the chromatic symmetric function of a graph is like a super chromatic polynomial—it not only counts the colorings, it counts the number of vertices of each color. This facilitates deeper knowledge of the structure of the graph and allows the exploitation of the machinery of classical symmetric function theory.
Symmetric functions are a longstanding part of algebraic combinatorics, and a fundamental question in symmetric function theory is whether a particular symmetric function, such as the chromatic symmetric function of a given graph, can be expressed with positive coefficients in terms of either the elementary or Schur symmetric function basis. The socalled epositivity or Schur positivity of a graph is an interesting and challenging question. For this project we will look at epositivity and Schur positivity for certain graph classes, exploiting the relationship between graph structure and the structure of tableaux (which define Schur functions). But which graphs to consider?
A number of graph classes such as trees and cycles, have already been explored, and there has been particular focus on clawfree graphs, owing to clawfree conjectures in Stanley's original papers. In fact, a natural way to characterize graph classes is in terms of the induced subgraphs they are free of, and in graph theory, already much effort has been spent in characterizing the chromatic characteristics of graphs that are Hfree, where H is some set of induced subgraphs. This literature is also at our disposal.
The key tasks and student responsibilities will be to generate examples of chromatic symmetric functions and related graphs, to explore relationships between graphs and tableaux through examples, to familiarize oneself with the proof techniques related to epositivity and Schur positivity, to formulate conjectures, and to prove them. The students will also use packages in Sage to test examples for Schur and epositivity. The outcome should be a journal publication.
Project 8: Model Theory and Free Probability
CoSupervisors: Bradd Hart, McMaster University and Ilijas Farah and Paul Skoufranis, York University
Project Description
Model theory is a branch of mathematical logic which studies classes of structures or models of theories in the sense of logic. Traditionally this logic has been classical firstorder logic and the techniques of firstorder model theory have been used successfully in many areas of algebra, number theory and geometry. Recently a new logic called continuous logic has been developed and it is more suited for applications in analysis. One area of particular interest if the study of von Neuman algebras (special algebras of operators acting on a Hilbert space) and free probability. We will look at the model theory of free group factors and ultraproducts of matrix algebras.
Students are not expected to know continuous logic, operator algebra or free probability; quick, intense short courses in all of these will be offered at the beginning of the project. A number of conjectures regarding the interaction of free probability and model theory already exist in the literature. We will survey these and work toward making a contribution to one or more of these by the end of the project. Past projects on similar topics have lead to research publications in respected journals and the goal of this project will be the same. Some familiarity with basic logic would be helpful and a solid grounding in linear algebra and analysis would be an asset.
For the project on Model theory and free probability, the two additional cosupervisors would be Ilijas Farah and Paul Skoufranis.
Project 9: Monte Carlo Simulation of the Impact of Distributional Properties on the Effectiveness of Cluster Boosted Regression
Supervisor: Mark Chignell, University of Toronto
Project Description
Clustering into patient types is a way of generating clinical predictions based on nonconfidential summarized patient data (Chignell et al., 2013). Predictions made based on segmented patient types using Clusterboosted regression can improve on predictions made using confidential raw patient data, with studies reported by Rouzbahman et al. (2017) showing around a 2 percent predication in the case of predicting length of stay and death status in an intensive care unit, and in predicting the likelihood of a visit to an emergency department within one month of assessment for late stage cancer patients.
The purpose of this project is to use Monte Carlo Simulation experiments to determine which distributional properties of multivariate data influence the magnitude of the boosting effect in cluster boosted regression. It is anticipated that this research should lead to a scientific paper that provides key insights into when and why cluster boosting is beneficial as well as providing criteria that can be used to determine which types of data set will stand to benefit more from the cluster boosting approach.
Key tasks include designing and running a series of Monte Carlo experiments to determining the multivariate properties of a data set that are linked to the amount of boosting that occurs when the data set is segmented with clustering prior to regression analysis. In the unlikely event that sufficient time remained after running and interpreting the Monte Carlo experiments, follow on work would examine the impact of different clustering methods on the relationship between distributional properties of the data and the amount of cluster boosting benefit in terms of predictive accuracy. This research would likely be done with the MIMIC II intensive care data set available from physionet.org.
To carry out this project you should have some experience with statistical analysis and regression analysis in particular, and should be familiar with the R programming language and associated statistical and machine learning packages. Some experience with design of experiments and Monte Carlo simulation would be helpful but not necessary.
References
 Rouzbahman, M., Jovicic, A., and Chignell, M. (2017). Can ClusterBoosted Regression Improve Prediction?: Death and Length of Stay in the ICU. IEEE Journal of Biomedical and Health Informatics, 21(3), 851858.
 Chignell, M., Rouzbahman, M., Kealey, M.R., Yu, E., Samavi, R. and Sieminowski, T. Development of NonConfidential Patient Types for Use in Emergency Medicine Clinical Decision Support. (2013). IEEE Security & Privacy, November/December, 28.
Project 10: Morphological Operations on Discrete Surfaces
Supervisor: Alec Jacobson, University of Toronto
Project Description
Morphological operations arise in computer graphics, computer vision and even physical processes such as crystal evolution. The simplest operations are dilation  where a shape grows outward  and erosion  where a shape shrinks inward. More complex operations can be designed by interweaving erosions and dilations. For example, dilating by a small amount and then eroding by the same amount will lead to a shrinkwrap effect called the "closing". The closing removes small gaps and holes, but will while otherwise staying close to the original surface. This is practically useful for preparing shapes for 3D printing. Closedform expressions for morphological operations are known for simple shapes. For all other cases, morphological results must be computed. Unfortunately, for a shape with a complex surface the typical volumetric representation  a grid storing whether each point is inside or outside  must be very high resolution to avoid staircaselike defects.
In this project, we will alleviate this by defining morphological operations directly on the discrete surface representation of a shape. The shape's volume is implicitly defined via this boundary and will expand and contract as the surface undergoes morphological changes. It is already known that simple erosion and dilation correspond to flows along partial differential equations (PDEs). We will expand this theory by describing more complex combinations operations as PDE flows. Initial derivations indicate that locally this corresponds a filtering of a standard outward flow. We will also handle global interactions such as holes or gaps closing by detecting "collisions" during the flow and merging the surface mesh in response.
After an introduction to the related literature in the first week of the summer, students will begin implementing dilations and erosions as flows of polygons in two dimensions. We will validate these results for a variety of shapes. The middle weeks will be spent generalizing to more complex operations (e.g., closing) and to discrete surfaces three dimensions. We will fabricate some of our results via 3D printing. The final weeks will be split between collision handling for global effects and preparing a report on the summer's work.
A robust, accurate, and efficient solution to this problem will have an immediate impact in computer graphics for shape approximation and simplification and in computational fabrication for predicting the 3D printability of a given shape. Any discipline relying on morphological operations for topological simplification or noise removal should also benefit; we may experiment with connections to postfiltering noisy 3D shapes created via deep learning.
The student undertaking this project will not only work on novel research with the intent to publish an academic paper, but will also have the opportunity to become familiar with the computer graphics and geometry processing scientific literature. This project will gather topics in numerical methods, sparse linear algebra, partial differential equations and computational geometry. In addition, the student will be invited to join the Dynamic Graphics Project (dgp) at the University of Toronto, Department of Computer Science, where we host weekly seminars and group research discussions with graduate students.
Project 11: Noninvasive Quantification of Brain Positron Emission Tomography Radioligand Binding
Supervisor: Pablo Rusjan, University of Toronto
Project Description
Positron emission tomography (PET) provides a unique tool to study the biochemistry of the human brain in vivo. The quantitation of proteins in the brain with PET (eg. neuroreceptor, enzyme) requires a radioligand and a kinetic model. Under some assumptions multi compartmental models can be used to model radioligands. Using the Laplace transform a general solution to the differential equations of the compartmental model can be found: the temporal evolution of the radioactivity in an area of the brain can be described as the convolutions of a sum of exponential functions with an input function. The input function can be measured from arterial blood samples and the parameters of the model (exponents and amplitudes of the exponential function) can be found using numerical algorithms (eg. nonlinear leastsquares). However, acquisition of arterial blood samples is complex and unpleasant. The goal of this proposal is to use functions of biometrics (eg. cerebral blood flow) to describe the input function without acquiring blood samples. The new formulation will increase the degrees of freedom of the problem, introducing new challenges for the minimization algorithms. Using real data and Monte Carlo simulations the reliability of the outcome parameters will be evaluated and compared with the standard quantification.
The project will involve three stages.
1. Background. Students will learn the necessary background concerning to PET, radioligand quantification, compartmental models and differential equations.
2. Coding and research. Students will study the algorithms that I already developed for solving kinetic models and will modify them to solve the new problem. Current codes are in Matlab and C++. Using their modified code, real data and simulations the student will investigate the outcome parameters for radioligand quantification in function of bias and variability respect of those obtained with the standard quantification.
3. Report. Students will compile their results in a detailed report.
Students working on this project should have a background in computer programming, specifically in Matlab. This project will introduce students to the mathematical modelling of pharmacokinetic system and the use of optimization algorithms and Monte Carlo simulations applied to solve practical biomedical problems. Students will finish the project with improved scientific skills, a feeling for the application of mathematical analysis to solve practical realworld problems and experience with nuclear medicine data. The results of the project could simplify strongly the experimental design for quantitation of certain PET radioligand.
Project 12: Optimal Hedging Strategies and Client Classification
Cosupervisors: Geoffrey Lynch and Vlad Ciubotariu, Oanda (Canada) Corporation
Project Description
OANDA operates a trading platform which allows clients to buy and sell FX and CFDs. When a client makes a trade to buy or sell a product, the company must decide to (a) offset the exposure immediately by hedging the trade with a bank, (b) hold the exposure for a period of time allowing the market to move before hedging the trade, or (c) hold the exposure indefinitely until the client closes the trade. OANDA currently handles hundreds of millions of transactions across the globe which are initiated by hundreds of thousands of unique clients and these decisions must be handled efficiently and effectively.
This project will involve deep learning and AI methodologies for the research of techniques that can be used for the classification of clients and algorithmic hedging strategies that can be applied to each class for the purpose of optimal hedging. In addition, research will be required around a suitable metric for measuring the performance of candidate solutions so that a "bestinclass" solution can be recommended to the company.
There are three key tasks to this project: (1) research potential solutions for client classification, e.g. these might include knearest neighbors, clustering, decision trees, gradient boosting, Bayesian techniques, principal component analysis, topological techniques, and others the research group wishes to investigate or develop. (2) Given each possible client classification technique we will want to analyze various hedging strategies. These might include: hedge immediately, delayed hedging, no hedging, algorithmic hedging, minimum variance portfolio construction, or other innovative portfolio construction techniques, and potentially detailed order book analysis. The latter will involve timeseries analysis of price data in conjunction with client data to develop a method that will best hedge future trades within each classified group. (3) Finally, we will want to research and develop a mathematically rigorous technique for measuring the performance of each possible strategy/classification combination so that we can recommend a proposal to the company for implementation.
Under the supervision of the Quantitative Trading Analytics team, the students will be required to devise a research plan so that each of the three areas receives sufficient attention, updated on a weekly basis, as necessary. Toward the end of the research period, the students will be required to put together a short presentation to explain their research findings and justification for the solution that they will propose. Students will work closely with the team since this is an active area of research.
At the end of the program, students will have developed their mathematical knowledge, gained an insight into how mathematics can be used in industry and have a good working knowledge of how deep machine learning and AI can be used to solve realworld problems in finance.
All the data will be accessed through secure Amazon Redshift servers managed by OANDA or secure data files. Data will be anonymized and adjusted where necessary to protect clients' identity and trading activities.
Project 13: The Symmetric Rendezvous Problem on a Triangle
Supervisor: Konstantinos Georgiou, Ryerson University
Project Rationale and Objective
In the classic Symmetric Rendezvous problem on a Line (SRL), two speed1 robots at known distance 2 but unknown locations execute the same synchronous randomized algorithm trying to minimize the expected rendezvous time, i.e. the expected time till the two robots meet. A longstanding conjecture is that the best possible rendezvous time is 4.25 with known upper and lower bounds being very close to that value.
In this project, we will study a variation of this classic problem, in which two robots reside at the vertices of an arbitrary triangle, whose edge lengths are known. Assuming that a rendezvous can occur only at a vertex, our goal will be to design synchronous symmetric randomized protocols whose objective is to minimize the expected rendezvous time. The main question we want to address is how the optimal rendezvous time changes in the new topology.
Key Tasks and Timeline
Numerous sophisticated techniques have been developed for proving upper and lower bounds for SRL and its variations. The main challenge in this project is to either translate these techniques in the topology induced by the new domain, or introduce new algorithmic paradigms. It is expected that literature review will span the first 2 weeks, while the rest will be devoted to the resolution of concrete research questions (45 weeks in upper bounds, and 23 weeks in lower bounds).
Student Responsibilities
Student is expected to have a background in Combinatorics, Probability Theory and Algorithmic Design. First, the student will familiarize with the Operations Research literature related to the proposed problem. Then, the student will propose and analyze the performance of new algorithms for solving the proposed optimization problem. Given that time allows it, the student will also try to establish matching lower bounds. Algorithms' performance is expected to be theoretical. However, depending on the circumstances, computer simulations may also be needed.
Expected Outcomes by the Conclusion of the Program
The most anticipated outcome for the project will be the introduction of new algorithms for a generalization of a wellstudied problem in Operations Research. Ideally, any new positive results will be accompanied by matching lower bounds.
Project 14: Unsupervised Learning Used in Dynamic Customer Journey
Supervisor: Gabby Silberman, Cerebri
Customer data by enterprises is on the rise and being used to understand customer journeys for many purposes. Unsupervised learning studies show how systems can learn to represent input patterns in a way that reflects the statistical structure of the overall collection of input patterns. This project uses unsupervised learning where there are no target outputs with each input. This project will assess data from many sources
The project will use categories of social media content, such as Twitter, Facebook, etc. with enterprisesupplied data (transactions, CRM, correspondence, etc.) and explore the wrapper framework for unsupervised learning. We will identify the issues involved in developing a feature selection algorithm for unsupervised learning and make recommendations on how to tackle these issues. We will train a variety of machine learning models on different combinations of enterprise and external data, and compare the supervised and unsupervised solutions.
During the first week, the students will become familiar with the experimentation toolkit Cerebri has created to build and evaluate machine learning models, including data cleansing tools, run time scripts, execution and monitoring environment, and become familiar with unsupervised frameworks. In the subsequent two weeks, they will be working on diverse experiments, using existing models and various combinations of enterprise and social media data in the context of our regular agile research sprints. In weeks 45, the students will be asked to develop their own models and test them on the same data as in weeks 23. In week 6, they will prepare, under the supervision of Dr. Gabby Silberman and Cerebri research personnel, a presentation of their results and insights. These results will be shared with the R&D organization for their feedback. The weeks 67 will be spent designing the final set of experiments, including testing ideas on what other types of social media data may be useful to gather. During weeks 89, the students will run experiments using the feedback and new ideas from the previous sprint, and present results to the executive team. Data Cerebri has access to enterprise data, as well as tools for creating synthetic datasets for testing models and algorithms. We also have access to social media data we have used for early experimentation. If the students uncover other useful sources for social media data, we will assess the feasibility and proper avenues for gathering the information.
Students will learn and experiment with stateoftheart machine learning tools with both supervised and unsupervised learning techniques, models and algorithms. They will get a sense of the potential for insights extracted from social media data to complement enterprise information for predicting a customer's behavior. Also, they will assess the relative effectiveness of machine learning models and algorithms, as well as the comparative cost and predictive value of various types of social media data. If warranted, a report/paper will be written to report on the project results to the broader community.
Directions
Click here for directions to the Fields Institute.
Directions from Woodsworth College Residence: walk south on St. George Street to College Street, turn right. Fields is the second building on your right.
Workshops and Conferences

2018 FUSRP Orientation and Welcome
July 3, 2018

2018 FUSRP Professional Development Workshop  Presentations
July 27, 2018

2018 FUSRP MiniConference
August 29, 2018