2018 Fields Undergraduate Summer Research Program
July 3 to August 31, 2018
Description
NEW: The 2018 projects are posted below for students' consideration and the student application form will open on December 15, 2017.
The Fields Undergraduate Summer Research Program (FUSRP) welcomes carefully selected undergraduate students from around the world for a rich mathematical research experience in July and August.
Each year, up to 25 students are selected from hundreds of applicants from mathematics-related disciplines to participate in the program.
This competitive initiative matches a group of up to five excellent students with faculty from Fields Principal Sponsoring or Affiliate Universities, visiting scientists, or researchers in industry.
Students accepted for the program will have most of their travel and on-site expenses covered by the Institute. Most of the program's funding supports student expenses and all student placements are based at Fields.
Goal
To provide a high-quality and enriching mathematics research experience for undergraduates.
The project experience, quality mentorship, and team/independent work are intended to foster enthusiasm for continued research. Students work closely with each other and with their supervisor in a collaborative research team.
FUSRP in the News
The 2017 FUSRP was featured in the September 22, 2017 issue of the Globe and Mail. The article, written by Ivan Semeniuk, describes the program’s focus on teamwork and real-world problems.
Read the full article here.
2018 Students
Name (Last, First) | Home Institution | Country (Institution) | Nationality | Project # |
To be announced |
2018 Supervisors and Projects
Name (Last, First) | Affiliation | Project | Students |
To be announced |
Applications
Supervisor/Project Submissions |
Closed |
Student Submissions |
Open (Apply Now) |
Important Dates
2017 | |
October 1 |
Call for Supervisor/Project Submissions. See here for details. |
November 15 | Supervisor/Project Submission Deadline. |
2018 | |
December 1 |
*Selected projects/supervisors will be posted and the application opened ASAP* Call for Student Applications. See here for details. Selected projects/supervisors are posted on the Fields website. |
January 15 | Student Application Deadline. |
February 1-15 | Successful students are contacted and offered a placement in FUSRP. |
February 15-28 | Names of successful/accepted students are posted on the Fields website. |
March-June | Students make appropriate travel/visa arrangements. |
July 3 | Program launch. |
August 31 | Program adjourns. |
Program Schedule
Week # | Date(s) | Activities |
Week 1 | July 3 |
See 2018 Fields Undergraduate Summer Research Program Orientation and Welcome |
Week 1 | July 5-7 | Students meet informally with their supervisor(s) and with other students in their group to work on their assigned research project. |
Week 2 | July 11 | Site visit to the University of Waterloo Faculty of Mathematics with pre-arranged, complimentary door-to-door transportation service to and from the University of Waterloo. This visit offers a unique opportunity to tour the Faculty of Mathematics with a small group of peers and gain insider knowledge about the Faculty directly from the professors and other students within the faculty. |
Weeks 2 and 3 | July 10-14 and 17-21 | Students meet informally with their supervisor(s) and with other students in their group to work on their assigned research project. Site visit to the supervisor's host institution. |
Week 3 | TBD |
Hands-on workshop hosted by Fields Institute and presented by Toastmasters on the topic of presentations and public speaking. Description: Toastmasters International is recognized as a non-profit organization. Collectively it helps 270,000 general members worldwide through public speaking to develop, improve, and practice their communication and leadership skills. At Ryerson Toastmasters, we have taken initiative in developing a comprehensive workshop based on the four pillars of communication. The pillars of communication include the following:
Many companies worldwide such as Google, Microsoft, Apple and even Walt Disney have developed branches of Toastmasters in their organizations to train their employees. Our goal as a student group is to have a larger outreach to the community around us and to help students at Ryerson and other universities in the GTA improve communication and leadership skills so that they are better prepared for the workforce. |
Weeks 4-8 | July 24-August 25 | Students meet informally with their supervisor(s) and with other students in their group to work on their assigned research project. |
Week 4 | TBD | Full day group excursion (all students welcome) organized and sponsored by the Fields Institute to Niagara Falls, Canada. |
Week 5 | TBD | Mid-program presentations: Each project will give a 10-minute presentation on the research done so far, and where they aim to be at the end of the program. |
Week 5 | TBD | PSUs Fair: The students will hear from representatives of the Institute's Principal Sponsoring Universities graduate programs. |
Week 9 | TBD | Mini-conference: The results of all summer student projects must be summarized and presented to other supervisor/student teams. Supervisors (or a qualified substitute) are required to make themselves available for the Mini-conference. |
Week 9 | TBD | Last day to check-out from Woodsworth College. We hope you have a safe trip home! |
Post-Program | September 8 | Scientific Report deadline. |
Post-Program | September 11 | Student Feedback Form deadline. |
Research Projects
Project 1: Big Data Information Extraction: Generating Seeded Named Entity Dictionary
Supervisor: Annie En-Shuin Lee,VerticalScope Inc.
Project Rationale and Objective
VerticalScope, Inc. owns and operates one of the most highly visited networks of online forums. These online discussion forums attract thousands of active discussions every day regarding products and services from a variety of businesses. Therefore, VerticalScope is in an unique position to glean insights from the vast amount of rich unstructured information generated by its forum users.
A core business initiative of VerticalScope is to identify names of products in these divergent verticals. Currently, the Data Science team at VerticalScope has supervised models for identifying automotive and powersports brands that is built from time-consuming and expensive manual human-labelled training sets. However, VerticalScope deals with many verticals which comes with different size, language, and content. Thus fitting a supervised model for each product/brand to each vertical can be very expensive.
Three major challenges faced by VerticalScope when identifying products are: 1) the high cost of generating training data for each type of product, 2) covering large divergent types of products from all verticals, and 3) disambiguating different types of entities based on context. Therefore, in the absence or lack of training data, alternative solution is semi-supervised learning algorithms such as 1) seeding dictionary, 2) pattern-based learning, and 3) pattern-bootstrapping with dictionary. The goal is to create a semi-supervised machine learning algorithm for identifying product names.
Key Tasks Associated with the Project and Timeline (9 weeks)
- Week 1, 2 Training: Learn the pattern-bootstrapping theoretical framework and familiarize with running the Spark Scala code for experimentation
- Week 3, 4 Experiments Round 1: Establish baseline of methodology for Automotive Vertical and collect evaluation data for all other verticals
- Week 5, 6 Experiments Round 2: Create models for all other Verticals, collect training data for other verticals as necessary
- Week 7, 8 Final Model Evaluation and Refinement: Measure the performance of each model and finalize models for all verticals
- Week 9 Technology Transfer and Final Wrap-up
Existing data or needed data collection efforts: There is groundtruth gold annotations and dictionary that is given from Automotive vertical. The student will require to collect data for other verticals
Student responsibilities
- Understand pattern-bootstrapping methodology for extracting dictionary and patterns
- Learn to run the existing Spark job on large-scale dataset and modify scala code as necessary
- Train and create machine learning models for the finalized task
Expected outcomes by the conclusion of the program: A clean set of named entity seeds from different Verticals
Project 2: Biomedical data analysis in transformed spaces
Supervisor: Hongmei Zhu, York University
Project Description
As exemplified by the rapid development of digital technology in hospitals, extracting clinically relevant information from medical data to aid disease detection has become an important part of clinical routine. Being able to process and analyze big data efficiently is a challenge. The challenge in turn requires the innovative development of optimal data representation that can efficiently reveal the desirable characteristics embedded in data. One key approach in this area is via mathematical transforms, such as Fourier, wavelets or time-frequency analysis. Data can be transformed from measured space to another space where features can easily extracted. An optimal time-frequency representation is easy to interpret and excellent for data analysis and feature detection. The goal of this research project is to investigate various transform in the context of a specific biomedical application.
Project 3: Constructing effective small-scale group testing schemes with side constraints
Co-supervisors: Kevin Cheung and Brett Stevens, Carleton University
Project Description
Group testing uses results in combinatorial designs to reduce the number of tests to perform to identify individual defects in large groups of items. Over the years, many constructions with good asymptotic bounds have been obtained with immediate applications in large-scale problems such as software testing and the human genome project. In addition to the successes in large-scale applications, group testing results also find their way in small-scale applications such as skills development and knowledge diagnostics. One challenge in finding efficient group tests for small-scale applications is that the asymptotic bounds are not applicable when the number of items is small. Therefore, a different kind of search strategy for efficient testing schemes is required. The focus of the proposed project is on constructing small group testing schemes. In particular, the work will involve adapting existing constructions or proposing new ones and implement them to generate concrete testing schemes that can be immediately applied in real-world situations. The end result is a library of such constructions that can be queried through a web interface.
The project will be in four phases:
1. Literature review and setup of computational environment (1 - 2 weeks)
2. Devising new methods of construction and testing their effectiveness (2 - 3 weeks)
3. Obtain efficient implementations for selected methods (2 - 3 weeks)
4. Set up database of constructions and web portal (1 - 2 weeks)
Selected references relevant to the project are:
- Chin, Francis Y. L. and Leung, Henry C. M. and Yiu, S. M. (2013): Non-adaptive complex group testing with multiple positive sets, Theory and Applications of Models of Computation, 505, 11-18
- Porat, Ely and Rothschild, Amir (2011): Explicit nonadaptive combinatorial group testing schemes, IEEE Transactions on Information Theory, 57, 7982-7989.
- Indyk, Piotr and Ngo, Hung Q. and Rudra, Atri (2010): Efficiently decodable non-adaptive group testing, Proceedings of the Twenty-first Annual ACM-SIAM Symposium on Discrete Algorithms, 1126-1142.
- Ngo, Hung Q. and Porat, Ely and Rudra, Atri (2011): Efficiently decodable error-correcting list disjunct matrices and applications. In: Aceto L., Henzinger M., Sgall J. (eds) Automata, Languages and Programming. ICALP 2011. Lecture Notes in Computer Science, 6755, 557-568.
Project 4: Control and Estimation of Shallow Water Waves
Supervisor: Kristen Morris, University of Waterloo
Project Description
The dynamics of many systems are modelled by partial differential equations (PDEs). Examples include acoustic noise, building ventilation, and lithium-ion cells. There are a number of theoretical and computational challenges in designing controllers and estimators for these systems. One issue is that in practice, controllers and estimators need to be calculated using an approximation to the PDE. The hope is that the controller/estimator will have the desired effect on the original system. However, sometimes neglected dynamics lead to sub-standard performance, or even instability when the algorithm is implemented. The issues for diffusion systems, which are exponentially stable, are well-understood and a number of approaches to computation are possible. However, many systems, such as those with convection-diffusion and coupled fluid-structures, are only asymptotically stable. The fact that a linear system can be asymptotically stable but not exponentially stable is one of the points that distinguishes PDE control and estimation from that for ordinary differential equations. The weak dissipation in these systems introduces theoretical and computational issues. Furthermore, the models for some of these systems are partial differential algebraic equations (PDAEs). Even establishingwell-posedness of these model is not straightforward.
The focus of the project is to study these issues in the context of a relatively simple example, linear shallow water waves in one space dimension with dissipation. A particular formulation of this model has eigenvalues asymptoting to the imaginary axis and so is not exponentially stable. It is not known however whether the model is asymptotically stable. Also, the question of obtaining suitable approximation methods is open. Numerical approximations for lightly damped systems are notorious for introducing spurious eigenvalues. While not an issue for simulation, this can degrade the behaviour of controllers and estimators designed using them and a different approach is needed.
The students will first establish well-posedness of the equation on a Hilbert space with norm and variables related to system energy. They will learn about about PDAEs as part of this effort. They will then attempt to show that the model is asymptotically stable. The key tool will be the Arendt-Batty Theorem. A final step will be to investigate numerical approximations. Recent work on discretization of port-Hamiltonian systems that preserve energy will be explored as part of this. If all this goes well, the students will design a controller and an estimator using an approximation and ascertain their performance with the original PDE. Another direction if the students make good progress is investigation of nonlinear shallow water waves.
Project 5: Development and evaluation of a hybrid symplectic integrator for planetary systems
Supervisor: Hanno Rein, University of Toronto
Project Description
Over 3000 new planetary systems have been discovered in the last 20 years. These systems exhibit complex dynamic behaviours, many of which remain poorly understood. However, understanding the current dynamical state of these systems is important as it allows us to put our own Solar System into context, finding similarities and differences between our home, the only one we know for sure hosts life, and these strange new worlds.
This project is about the methods used for numerical simulations of both exo-planetary systems and our own Solar System. Having fast and accurate numerical tools to perform such simulations is crucial if we want to understand the dynamical architecture of these systems. The equations of motions that govern the dynamical evolution are well know. However, due to the inherent sequential nature of the problem, it remains incredibly hard to solve. Even highly optimized algorithms on modern CPUs might need months to integrate the ordinary differential equations of an 8 planet system because they need up to 100 billion individual timesteps. My research group has made substantial progress in recent years by developing the world's most efficient symplectic ODE integrators (WHFast, JANUS, IAS15) for planetary N-body simulations. However, one particular problem remains unsolved: adapting these symplectic ODE integrators so that they can efficiently integrate when collisions and close encounters occur between planets. Hybrid symplectic integrators that smoothly switch from one integration method to another are a promising solution.
In this project you will learn how geometric, and in particular symplectic integration methods work. In the first two weeks you will write your own symplectic mixed variable integrator for the Kepler problem and develop a test-bench to measure its accuracy and speed. In weeks three and four, you will extend the integration algorithm to a hybrid scheme. You will be able to build on the progress that my group has already made by using our integrators as building blocks for your own hybrid scheme. You will develop a specific set of test problems to monitor the integrator's properties such as energy conservation and symplecticity in cases where close encounters happen. In the remaining weeks, you will try to optimize your integrator and merge it into the REBOUND integrator package. Your will develop a user friendly interface to your algorithm and a set of examples that will allow other people to learn how to use your algorithm for astrophysical applications. If everything goes well, you will start to write up the results in the form of a paper to be submitted to a peer review journal by the end of summer.
Project 6: Discrete-Event Systems Model of a System's Ability to Protect Secrets
Supervisor: Karen Rudie, Queen's University
Project Description
To mathematically model problems that arise in cyber-physical systems, we use finite automata, from theoretical computing. Automata are 5-tuples that can be used to represent how processes moves from state to state upon the occurrence of events. These systems are called discrete-event systems (DES) and they are nearly identical to directed graphs. A sequence of events in the DES is called a string and is comparable to a path in a graph. We assume that not all events in a string are observable to an agent. We model an agent's observations with a mapping ϕ and then for a given string s generated by the system ϕ(s) would be the sequence of events that the agent observes.
Recently, researchers have been examining opacity, the ability of a system to prevent some set of strings (called secrets) from being distinguished from some other set of strings (non-secrets). When a system is opaque then secrets cannot be distinguished from non-secrets. Two strings being indistinguishable is like imagining a directed graph and asking if the edge labels that you can observe along one path from some vertex are the same as the ones you can observe along another path from the same vertex.
In this project we are interested in decentralized agents, each of whom has a potentially different observation mapping and we wish to determine a strategy for communicating event occurrences among the agents so that to some agent the system is non-opaque. The difficulty of the problem arises when one wants to find a minimal set of communications because what each agent observes (both directly and from communications sent to it) impacts what it can communicate to other agents. Moreover some DES communication problems are non-monotonic so that finding a minimal solution is difficult: namely, it is not the case that the more an agent observes the more strings the agent can distinguish.
This problem can be used to model a group of hackers or invaders separated geographically (i.e., each with only partial observation) working in concert to steal private information (i.e., to render the system non-opaque).
In the first few weeks of the program, students will read about DES, opacity, and minimal communication problems. Then the students will start tackling the problem. Research in this area would likely appeal to students who enjoy discrete mathematics or graph theory.
Project 7: Graphs, chromatic symmetric functions, and positivity
Supervisor: Angele Hamel, Wilfred Laurier University
Project Description
Chromatic symmetric functions—the focus of this project in combinatorics—sit at the intersection of graph theory and enumeration. These symmetric functions, defined in 1995 by Richard Stanley of MIT, generalize chromatic polynomials, well-known objects in graph theory that count the number of colorings of a particular graph. By contrast the chromatic symmetric function of a graph is like a super chromatic polynomial—it not only counts the colorings, it counts the number of vertices of each color. This facilitates deeper knowledge of the structure of the graph, and allows the exploitation of the machinery of classical symmetric function theory.
Symmetric functions are a long-standing part of algebraic combinatorics, and a fundamental question in symmetric function theory is whether a particular symmetric function, such as the chromatic symmetric function of a given graph, can be expressed with positive coefficients in terms of either the elementary or Schur symmetric function basis. The so-called e-positivity or Schur positivity of a graph is an interesting and challenging question. For this project we will look at e-positivity and Schur positivity for certain graph classes, exploiting the relationship between graph structure and the structure of tableaux (which define Schur functions). But which graphs to consider?
A number of graph classes such as trees and cycles, have already been explored, and there has been particular focus on clawfree graphs, owing to clawfree conjectures in Stanley's original papers. In fact, a natural way to characterize graph classes is in terms of the induced subgraphs they are free of, and in graph theory, already much effort has been spent in characterizing the chromatic characteristics of graphs that are H-free, where H is some set of induced subgraphs. This literature is also at our disposal.
The key tasks and student responsibilities will be to generate examples of chromatic symmetric functions and related graphs, to explore relationships between graphs and tableaux through examples, to familiarize oneself with the proof techniques related to e-positivity and Schur positivity, to formulate conjectures, and to prove them. The students will also use packages in Sage to test examples for Schur and e-positivity. The outcome should be a journal publication.
Project 8: Model theory and free probability
Co-Supervisors: Bradd Hart, McMaster University and Ilijas Farah and Paul Skoufranis, York University
Project Description
Model theory is a branch of mathematical logic which studies classes of structures or models of theories in the sense of logic. Traditionally this logic has been classical first order logic and the techniques of first order model theory have been used successfully in many areas of algebra, number theory and geometry. Recently a new logic called continuous logic has been developed and it is more suited for applications in analysis. One area of particular interest if the study of von Neuman algebras (special algebras of operators acting on a Hilbert space) and free probability. We will look at the model theory of free group factors and ultraproducts of matrix algebras.
Some familiarity with basic logic would be helpful and a solid grounding in linear algebra and analysis would be an asset.
For the project on Model theory and free probability, the two additional co-supervisors would be Ilijas Farah and Paul Skoufranis.
Project 9: Monte Carlo Simulation of the Impact of Distributional Properties on the Effectiveness of Cluster Boosted Regression
Supervisor: Mark Chignell, University of Toronto
Project Description
Clustering into patient types is a way of generating clinical predictions based on non-confidential summarized patient data (Chignell et al., 2013). Predictions made based on segmented patient types using Cluster-boosted regression can improve on predictions made using confidential raw patient data, with studies reported by Rouzbahman et al. (2017) showing around a 2 percent predication in the case of predicting length of stay and death status in an intensive care unit, and in predicting the likelihood of a visit to an emergency department within one month of assessment for late stage cancer patients.
The purpose of this project is to use Monte Carlo Simulation experiments to determine which distributional properties of multivariate data influence the magnitude of the boosting effect in cluster boosted regression. It is anticipated that this research should lead to a scientific paper that provides key insights into when and why cluster boosting is beneficial as well as providing criteria that can be used to determine which types of data set will stand to benefit more from the cluster boosting approach.
Key tasks include designing and running a series of Monte Carlo experiments to determining the multivariate properties of a data set that are linked to the amount of boosting that occurs when the data set is segmented with clustering prior to regression analysis. In the unlikely event that sufficient time remained after running and interpreting the Monte Carlo experiments, follow on work would examine the impact of different clustering methods on the relationship between distributional properties of the data and the amount of cluster boosting benefit in terms of predictive accuracy. This research would likely be done with the MIMIC II intensive care data set available from physionet.org.
To carry out this project you should have some experience with statistical analysis and regression analysis in particular, and should be familiar with the R programming language and associated statistical and machine learning packages. Some experience with design of experiments and Monte Carlo simulation would be helpful but not necessary.
References
- Rouzbahman, M., Jovicic, A., and Chignell, M. (2017). Can Cluster-Boosted Regression Improve Prediction?: Death and Length of Stay in the ICU. IEEE Journal of Biomedical and Health Informatics, 21(3), 851-858.
- Chignell, M., Rouzbahman, M., Kealey, M.R., Yu, E., Samavi, R. and Sieminowski, T. Development of Non-Confidential Patient Types for Use in Emergency Medicine Clinical Decision Support. (2013). IEEE Security & Privacy, November/December, 2-8.
Project 10: Morphological operations on discrete surfaces
Supervisor: Alec Jacobson, University of Toronto
Project Description
Morphological operations arise in computer graphics, computer vision and even physical processes such as crystal evolution. The simplest operations are dilation --- where a shape grows outward --- and erosion --- where a shape shrinks inward. More complex operations can be designed by interweaving erosions and dilations. For example, dilating by a small amount and then eroding by the same amount will lead to a shrinkwrap effect called the "closing". The closing removes small gaps and holes, but will while otherwise staying close to the original surface. This is practically useful for preparing shapes for 3D printing. Closed-form expressions for morphological operations are known for simple shapes. For all other cases, morphological results must be computed. Unfortunately, for a shape with a complex surface the typical volumetric representation -- a grid storing whether each point is inside or outside -- must be very high resolution to avoid staircase-like defects.
In this project, we will alleviate this by defining morphological operations directly on the discrete surface representation of a shape. The shape's volume is implicitly defined via this boundary and will expand and contract as the surface undergoes morphological changes. It is already known that simple erosion and dilation correspond to flows along partial differential equations (PDEs). We will expand this theory by describing more complex combinations operations as PDE flows. Initial derivations indicate that locally this corresponds a filtering of a standard outward flow. We will also handle global interactions such as holes or gaps closing by detecting "collisions" during the flow and merging the surface mesh in response.
After an introduction to the related literature in the first week of the summer, students will begin implementing dilations and erosions as flows of polygons in two dimensions. We will validate these results for a variety of shapes. The middle weeks will be spent generalizing to more complex operations (e.g., closing) and to discrete surfaces three dimensions. We will fabricate some of our results via 3D printing. The final weeks will be split between collision handling for global effects and preparing a report on the summer's work.
A robust, accurate, and efficient solution to this problem will have an immediate impact in computer graphics for shape approximation and simplification and in computational fabrication for predicting the 3D printability of a given shape. Any discipline relying on morphological operations for topological simplification or noise removal should also benefit; we may experiment with connections to post-filtering noisy 3D shapes created via deep learning.
The student undertaking this project will not only work on novel research with the intent to publish an academic paper, but will also have the opportunity to become familiar with the computer graphics and geometry processing scientific literature. This project will gather topics in numerical methods, sparse linear algebra, partial differential equations and computational geometry. In addition, the student will be invited to join the Dynamic Graphics Project (dgp) at the University of Toronto, Department of Computer Science, where we host weekly seminars and group research discussions with graduate students.
Project 11: Noninvasive Quantification of Brain Positron emission tomography Radioligand Binding
Supervisor: Pablo Rusjan, University of Toronto
Project Description
Positron emission tomography (PET) provides a unique tool to study the biochemistry of the human brain in vivo. The quantitation of proteins in the brain with PET (eg. neuroreceptor, enzyme) requires a radioligand and a kinetic model. Under some assumptions multi compartmental models can be used to model radioligands. Using the Laplace transform a general solution to the differential equations of the compartmental model can be found: the temporal evolution of the radioactivity in an area of the brain can be described as the convolutions of a sum of exponential functions with an input function. The input function can be measured from arterial blood samples and the parameters of the model (exponents and amplitudes of the exponential function) can be found using numerical algorithms (eg. nonlinear least-squares). However acquisition of arterial blood samples is complex and unpleasant. The goal of this proposal is to use functions of biometrics (eg. cerebral blood flow) to describe the input function without acquiring blood samples. The new formulation will increase the degrees of freedom of the problem, introducing new challenges for the minimization algorithms. Using real data and Monte Carlo simulations the reliability of the outcome parameters will be evaluated and compared with the standard quantification.
The project will involve three stages.
1. Background. Students will learn the necessary background concerning to PET, radioligand quantification, compartmental models and differential equations.
2. Coding and research. Students will study the algorithms that I already developed for solving kinetic models and will modify them to solve the new problem. Current codes are in Matlab and C++. Using their modified code, real data and simulations the student will investigate the outcome parameters for radioligand quantification in function of bias and variability respect of those obtained with the standard quantification.
3. Report. Students will compile their results in a detailed report.
Students working on this project should have a background in computer programming, specifically in Matlab. This project will introduce students to the mathematical modelling of pharmacokinetic system and the use of optimization algorithms and Monte Carlo simulations applied to solve practical biomedical problems. Students will finish the project with improved scientific skills, a feeling for the application of mathematical analysis to solve practical real-world problems and experience with nuclear medicine data. The results of the project could simplify strongly the experimental design for quantitation of certain PET radioligand.
Project 12: Optimal Hedging Strategies and Client Classification
Co-supervisors: Geoffrey Lynch and Vlad Ciubotariu, Oanda (Canada) Corporation
Project Description
OANDA operates a trading platform which allows clients to buy and sell FX and CFDs. When a client makes a trade to buy or sell a product, the company must decide to (a) offset the exposure immediately by hedging the trade with a bank, (b) hold the exposure for a period of time allowing the market to move before hedging the trade, or (c) hold the exposure indefinitely until the client closes the trade. OANDA currently handles hundreds of millions of transactions across the globe which are initiated by hundreds of thousands of unique clients and these decisions must be handled efficiently and effectively.
This project will involve deep learning and AI methodologies for the research of techniques that can be used for the classification of clients and algorithmic hedging strategies that can be applied to each class for the purpose of optimal hedging. In addition, research will be required around a suitable metric for measuring the performance of candidate solutions so that a "best-in-class" solution can be recommended to the company.
There are three key tasks to this project: (1) research potential solutions for client classification, e.g. these might include k-nearest neighbors, clustering, decision trees, gradient boosting, Bayesian techniques, principal component analysis, topological techniques, and others the research group wishes to investigate or develop. (2) Given each possible client classification technique we will want to analyze various hedging strategies. These might include: hedge immediately, delayed hedging, no hedging, algorithmic hedging, minimum variance portfolio construction, or other innovative portfolio construction techniques, and potentially detailed order book analysis. The latter will involve time-series analysis of price data in conjunction with client data to develop a method that will best hedge future trades within each classified group. (3) Finally, we will want to research and develop a mathematically rigorous technique for measuring the performance of each possible strategy/classification combination so that we can recommend a proposal to the company for implementation.
Under the supervision of the Quantitative Trading Analytics team, the students will be required to devise a research plan so that each of the three areas receives sufficient attention, updated on a weekly basis, as necessary. Toward the end of the research period, the students will be required to put together a short presentation to explain their research findings and justification for the solution that they will propose. Students will work closely with the team since this is an active area of research.
At the end of the program, students will have developed their mathematical knowledge, gained an insight into how mathematics can be used in industry and have a good working knowledge of how deep machine learning and AI can be used to solve real-world problems in finance.
All the data will be accessed through secure Amazon Redshift servers managed by OANDA or secure data files. Data will be anonymized and adjusted where necessary to protect clients' identity and trading activities.
Project 13: The Symmetric Rendezvous Problem on a Triangle
Supervisor: Konstantinos Georgiou, Ryerson
Project Rationale and Objective
In the classic Symmetric Rendezvous problem on a Line (SRL), two speed-1 robots at known distance 2 but unknown locations execute the same synchronous randomized algorithm trying to minimize the expected rendezvous time, i.e. the expected time till the two robots meet. A long standing conjecture is that the best possible rendezvous time is 4.25 with known upper and lower bounds being very close to that value.
In this project, we will study a variation of this classic problem, in which two robots reside at the vertices of an arbitrary triangle, whose edge lengths are known. Assuming that a rendezvous can occur only at a vertex, our goal will be to design synchronous symmetric randomized protocols whose objective is to minimize the expected rendezvous time. The main question we want to address is how the optimal rendezvous time changes in the new topology.
Key Tasks and Timeline
Numerous sophisticated techniques have been developed for proving upper and lower bounds for SRL and its variations. The main challenge in this project is to either translate these techniques in the topology induced by the new domain, or introduce new algorithmic paradigms. It is expected that literature review will span the first 2 weeks, while the rest will be devoted to the resolution of concrete research questions (4-5 weeks in upper bounds, and 2-3 weeks in lower bounds).
Student Responsibilities
Student is expected to have a background in Combinatorics, Probability Theory and Algorithmic Design. First, the student will familiarize with the Operations Research literature related to the proposed problem. Then, the student will propose and analyze the performance of new algorithms for solving the proposed optimization problem. Given that time allows it, the student will also try to establish matching lower bounds. Algorithms' performance is expected to be theoretical. However, depending on the circumstances, computer simulations may also be needed.
Expected Outcomes by the Conclusion of the Program
The most anticipated outcome for the project will be the introduction of new algorithms for a generalization of a well-studied problem in Operations Research. Ideally, any new positive results will be accompanied by matching lower bounds.
Project 14: Unsupervised Learning used in Dynamic Customer Journey
Supervisor: Gabby Silberman, Cerebri
Customer data by enterprises is on the rise and being used to understand customer journeys for many purposes. Unsupervised learning studies show how systems can learn to represent input patterns in a way that reflects the statistical structure of the overall collection of input patterns. This project uses unsupervised learning where there are no target outputs with each input. This project will assess data from many sources
The project will use categories of social media content, such as Twitter, Facebook, etc. with enterprise-supplied data (transactions, CRM, correspondence, etc.) and explore the wrapper framework for unsupervised learning. We will identify the issues involved in developing a feature selection algorithm for unsupervised learning and make recommendations on how to tackle these issues. We will train a variety of machine learning models on different combinations of enterprise and external data, and compare the supervised and unsupervised solutions.
During the first week the students will become familiar with the experimentation toolkit Cerebri has created to build and evaluate machine learning models, including data cleansing tools, run time scripts, execution and monitoring environment, and become familiar with unsupervised frameworks. In the subsequent two weeks, they will be working on diverse experiments, using existing models and various combinations of enterprise and social media data in the context of our regular agile research sprints. In weeks 4-5, the students will be asked to develop their own models and test them on the same data as in weeks 2-3. In week 6, they will prepare, under the supervision of Dr. Gabby Silberman and Cerebri research personnel, a presentation of their results and insights. These results will be shared with the R&D organization for their feedback. The weeks 6-7 will be spent designing the final set of experiments, including testing ideas on what other types of social media data may be useful to gather. During weeks 8-9, the students will run experiments using the feedback and new ideas from the previous sprint, and present results to the executive team. Data Cerebri has access to enterprise data, as well as tools for creating synthetic datasets for testing models and algorithms. We also have access to social media data we have used for early experimentation. If the students uncover other useful sources for social media data, we will assess the feasibility and proper avenues for gathering the information.
Students will learn and experiment with state-of-the-art machine learning tools with both supervised and unsupervised learning techniques, models and algorithms. They will get a sense of the potential for insights extracted from social media data to complement enterprise information for predicting a customer's behavior. Also, they will assess the relative effectiveness of machine learning models and algorithms, as well as the comparative cost and predictive value of various types of social media data. If warranted, a report/paper will be written to report on the project results to the broader community.
Directions
Click here for directions to the Fields Institute.
Directions from Woodsworth College Residence: walk south on St. George Street to College Street, turn right. Fields is the second building on your right.