April 24, 2014

August 4-6, 2008
Workshop on the Effective Use of High Performance Workstations In Scientific Computing

    Case Study: Using Coconut to Optimise non-Cartesian MRI for the Cell BE
    Christopher Kumar Anand, McMaster University and Optimal Computational Algorithms, Inc.
    Coauthors: Wolfram Kahl

    We will discuss the opportunities presented by novel multi-core architectures and the Cell BE in particular, and the challenges in harnessing their potential. We will then describe the support for code optimisation provided by the tool Coconut (COde CONstructing User Tool). Finally, we will demonstrate the use of this tool in optimising non-Cartesian magnetic resonance image reconstruction. While Coconut makes it easy to develop complicated implementations correctly, it does not automate the replacement of essentially serial algorithms with parallel algorithms. This is the fun, creative part of software development, which is highly variable from application to application. In the MRI case, we will show how to modify the basic reconstruction processes to tune the algorithm to the local memory bandwidth, cache size and processor FLOPs.


    Adaptive and high-order methods for American option pricing
    Duy Minh Dang, University of Toronto
    Coauthors: Christina Christara

    We present an adaptive and high order method pricing American vanilla options written on single asset with constant volatility and interest rate. The high-order discretization in space is based on optimal quadratic spline collocation (QSC) methods while the timestepping is handled by the Crank-Nicolson (CN) technique. To control the space error, the mesh redistribution strategy based on an equidistribution principle is used. At each timestep, solution of only one tridiagonal linear system is required. Numerical results indicate the efficiency of the adaptive techniques and high-order discretization methods.


    Programming and Mathematical Software for Multicore Architectures
    Craig C. Douglas, University of Wyoming and Yale University

    Essentially all computers sold in 2008 will have multiple cores. The vast majority of multicore CPUs will have identical cores, though not all. In 2009, there will be several CPUs available with different types of cores. In this talk, I will review multicore architectures briefly and will discuss different ways to program them and provide pointers to mathematical software that now exists to help use them for scientific programming.

    This talk is based on the speaker's experiences teaching a semester course on the topic during the spring 2008 semester as a visiting professor at Texas A&M University.


    The Cell BE: architecture/programming, vector elementary functions, and other numerical software
    Robert F. Enenkel, IBM Canada Ltd.
    Coauthors: Christopher K. Anand, McMaster University and Optimal Computational Algorithms, Inc.

    My talk will give an introduction to the Cell BE heterogeneous multiprocessor computer, including topics such as hardware architecture, instruction set and programming considerations, floating point characteristics, and development/simulation tools. I will then discuss the design of SPU MASS, a high-performance SIMD/vector elementary function library for the Cell BE SPU. If time allows, I will also give a survey of other numerical libraries and/or scientific computing applications on Cell BE.

    Wayne Enright, University of Toronto
    Developing Easy to use ODE Software: The associated Cost/Reliability Trade offs

    In the numerical solution of ODEs, it is now possible to develop efficient techniques that compute approximate solutions that are more convenient to interpret and use by researchers who are interested in accurate and reliable simulations of their mathematical models. To illustrate this we have developed a class of numerical ODE solvers that will deliver a piecewise polynomial as the approximate (or numerical solution) to an ODE problem.

    The resulting methods are designed so that the resulting piecewise polynomial will satisfy a perturbed ODE with an associated defect (or residual) that is directly controlled in a consistent way. We will discuss the quality/cost trade off that one faces when implementing and using such methods. Numerical results on a range of problems will be summarized for methods of orders five through eight.


    A Numerical Scheme for the Impulse Control Formulation for Pricing Variable Annuities with a Guaranteed Minimum Withdrawal Benefit (GMWB)
    Peter Forsyth, University of Waterloo
    Coauthors: Zhuliang Chen

    In this paper, we outline an impulse stochastic control formulation for pricing variable annuities with a Guaranteed Minimum Withdrawal Benefit (GMWB) assuming the policyholder is allowed to withdraw funds continuously. We develop a numerical scheme for solving the Hamilton-Jacobi-Bellman (HJB) variational inequality corresponding to the impulse control problem. We prove the convergence of our scheme to the viscosity solution of the continuous withdrawal problem, provided a strong comparison result holds. The scheme can be easily generalized to price discrete withdrawal contracts. Numerical experiments are conducted, which show a region where the optimal control appears to be non-unique.


    Dynamically-consistent Finite-Difference Methods for Disease Transmission Models
    Abba Gumel
    , University of Manitoba

    Models for the transmission dynamics of human diseases are often formulated in the form of systems of nonlinear differential equations. The (typically) large size and high nonlinearity of some of these systems make their analytical solutions almost impossible to obtain. Consequently, robust numerical methods must be used to obtain their approximate solutions. This talk addresses the problem and challenges of designing discrete-time models (finite-difference methods) that are dynamically consistent with the continuous-time disease transmission models they approximate (in particular in preserving some of the key properties of the continuous-time models such as positivity, boundedness, mimicking correct bifurcations etc.).


    Portable Software Development for Multi-core Processors, Many-core Accelerators, and Heterogeneous Architectures
    Michael McCool, RapidMind/University of Waterloo

    New processor architectures, including many-core accelerators like GPUs, multi-core CPUs, and heterogeneous architectures like the Cell BE, provide many opportunities for improved performance. However, programming these architectures productively in a performant and portable way is challenging. We have developed a software development platform that uses a common SPMD parallel programming model for all these processor architectures. The RapidMind platform allows developers to easily create single-source, single-threaded programs with an existing, standard C++ compiler that can target all the processing resources in such architectures. When compared to tuned baseline code using the best optimizing C++ compilers available, RapidMind-enabled code can demonstrate speedups of over an order of magnitude on x86 dual-processor quad-core systems and two orders of magnitude on accelerators.


    Deterministic and stochastic models in synthetic biology
    David McMillen
    Dept of Chemical and Physical Sciences, University of Toronto Mississauga

    Synthetic biology is an emerging field in which we seek to design and implement novel biological systems, or change the properties of existing ones. Closely related to systems biology, one of the recurring themes of synthetic biology is the use of cellular models as an aid in design: we want to be able to gain some predictive insight into cellular processes, rather than simply using trial-and-error.

    A variety of modelling methods have been used to approach this problem, and I will present a non-specialist introduction to the biological issues, and a survey of some popular techniques currently being applied in biological modelling work.


    Modern Radiation Therapy: Image is Everything
    Douglas Moseley, Princess Margaret Hospital
    Coauthors: Michael B. Sharpe, David A. Jaffray

    Recent technical advances in Image-Guided Radiation Therapy (IGRT) have motivated many applications in high-performance computing . These include the addition of a kV x-ray tube and an amorphous silicone flat-panel detector to the gantry of a medical linear accelerator. A series of 2D kV projection images of the patient are captured as the gantry rotates. These 2D projection images are then reconstructed into a 3D volumetric image using a filtered back projection technique. The algorithm, a modified Feldkamp, is CPU intensive but highly parallelizable and hence can take advantage of multi-process architectures. At Princess Margaret Hospital about 25% of our patients receive daily on-line image guidance.

    In this talk an introduction to the field of radiation therapy will be presented. The back-projection algorithm will be discussed including an extension of the reconstruction algorithm which in the presence of patient motion allows for the generation of 4D images (3D in space and 1D in time). Other computing intensive applications in radiation therapy such as image registration (rigid and non-rigid) will be introduced.


    A User-Friendly Fortran 90/95 Boundary Value ODE Solver
    Paul Muir, Saint Mary's University
    Coauthors: Larry Shampine

    This talk will describe a recently developed user-friendly Fortran 90/95 software package for the numerical solution of boundary value ordinary differential equations (BVODEs). This package, called BVP_SOLVER, was written by the author and Larry Shampine, of Southern Methodist University, and has evolved from an earlier Fortran 77 package called MIKRDC, written by the author and Wayne Enright, of the University of Toronto. This new solver takes advantage of a number of language features of Fortran 90/95 to provide a substantially simpler experience for the user, while at the same time employing the underlying robust, high quality algorithms of the original MIRKDC code. BVP_SOLVER has a simple, flexible interface that is related to that of the Matlab BVODE solver, bvp4c (Kierzenka and Shampine), and at the same time, it has the fast execution speed of Fortran. BVP_SOLVER implements several numerical algorithms that in fact represent improvements over those employed in MIRKDC. Examples are presented to demonstrate the capabilities of the new solver. Several current projects associated with enhancing the capabilities of BVP_SOLVER will also be discussed.


    HPC, an effective tool for three-dimensional simulations of nonlinear internal waves
    Van Thinh Nguyen, University of Waterloo
    Coauthors: K.G. Lamb

    Stratified flows over topographies present challenging geophysical fluid dynamic problems with far reaching implications for circulation and mixing in oceans and the Great Lakes. The complexity of natural boundaries and topographies means that a three-dimensional model is often necessary. In addition, the hydrostatic approximation breaks down for small-scale processes such as the steepening, formation and breaking of non linear internal and non-hydrostatic waves; therefore, a non-hydrostatic three-dimensional model is required to study such phenomena. Finally, in order to simulate small-scale processes, high resolutions are necessary; and as a result, this requires use of High Performance Computers (HPC).

    Nonlinear internal waves play an important role in redistribution of nutrients, pollutants and sediments in oceans and lakes. In this study, large scale simulations of nonlinear internal waves generated by tidal and wind forcing over three-dimensional topographies in the St. Lawrence Estuary (SLE) and Lake Erie are described. The simulations are based on the MIT general circulation model (MITgcm) developed by Marshall et al., (1997) with some modifications for quasi-two layer fluid and open boundary conditions. The code is designed using a multi-level parallel decomposition to facilitate efficient execution on all foreseeable parallel computer architectures and processor types (Wrappable Application Parallel Programming Environment Resources Infrastructure).

    The simulation domain has been divided into a number of subdomains as tiles. Tiles consist of an interior region and an overlap region with an adjacent tile. The resulting tiles are owned by an individual processor and it is possible for a processor to own several tiles. The owning processors perform the arithmetic operations associated with a tile. Except for periods of communication or coordination, each processor computes autonomously, working only with data from the tile (or tiles) that the processor owns. The code has been scaled on different parallel platforms (shared and distributed memories) provided by SHARCNET (Ontario) and RQCHP (Quebec).

    The numerical results are compared with data from direct observations in the St. Lawrence Estuary (SLE) and Lake Erie. In addition, a comparison of performances between different platforms will be presented.


    Real-Time Big Physics Applications In National Instruments LabVIEW
    Darren Schmidt, National Instruments, et al.
    Coauthors: Bryan Marker, Lothar Wenzel, Louis Giannone, Bertrand Bauvir, Toomas Erm

    Many big physics applications require an enormous amount of computational power to solve large mathematical problems with demanding real-time constraints. Typically, the mathematical model is fed data from real-world acquisition channels and required to generate outputs in less than a 1 ms for problem sizes of dimension 1,000 to 100,000 and beyond.

    We report on an approach for tokamaks and extremely large telescopes that utilizes an application development environment called LabVIEW. This environment offers algorithm development that naturally exposes parallel processing through it's graphical programming language and takes advantage of multi-core processors without user intervention. We show that the required computational power is accessible through LabVIEW via multi-core implementations targeting a combination of processors including CPUs, FPGAs, and GPUs deployed on standard PCs, workstations, and real-time systems. Several benchmarks illustrate what is achievable with such an approach.


    The Numerical Mathematics Consortium - Establishing a Verification Framework for Numerical Algorithm Development
    Darren Schmidt, National Instruments

    The Numerical Mathematics Consortium (NMC) is a group of numerics experts from academia and industry working to address some of the challenges faced by developers and users of mathematical software. The goal of the group is to address artifacts of implementation decisions made by library and system developers in the absence of a standard or even a consensus. The lack of a published standard means that the user of a mathematical software library must rely on their skill, experience and wits, or on the possibly weakly supported opinions of others, to assess the quality of such a library or its compatibility with other libraries.

    The emergence of new technologies, such as multicore processors, further exacerbates the situation. Developers are racing to develop libraries which can take full advantage of these technologies, and again the lack of a reference standard will seriously pollute the results.

    For the library or system developer, the NMC Standard will provide essential information with respect to mathematical decisions (where the mathematical definition is not unique or does not uniquely determine the computer library definition), calling sequences, return types and algorithm parameters, among other characteristics. It will also provide accuracy and consistency benchmarks and evaluations. For the user, the successful evaluation of a numerics library or system against the NMC Standard will provide assurance that the library or system can be used as is and will provide results consistent with other similarly evaluated libraries or systems.


    Data analysis for the Microwave Limb Sounder instrument on the EOS Aura Satellite
    Van Snyder, Jet Propulsion Laboratory, California Institute of Technology

    A brief discussion of the goals of the Microwave Limb Sounder (MLS) instrument on NASA's Earth Observing System (EOS) Aura satellite, which was launched on 15 July 2004, is followed by a discussion of the organization and mathematical methods of the software used for analysis of data received from that instrument.

    The MLS instrument measures microwave thermal emission from the atmosphere by scanning the earth's limb 3500 times per day. The limb tangent altitude varies from 8 to 80 km. Roughly 600 million measurements in five spectral bands are returned every day. From these, estimates of the concentration of approximately 20 trace constituents, and temperature, are formed at 70 pressure levels on each of these scans - altogether roughly 5 million results per day. The program is organized to process "chunks" consisting of about 20 scans, with a five-scan overlap at both ends of the chunk, giving 350 chunks per day. Each chunk requires about 15 hours on a 3.6 GHz Pentium Xeon workstation. Although one chunk can be processed on a workstation, processing all of the data requires the attention of 350 such processors.

    The software is an interpreter of a "little language, " as described in April 2008 Software: Practice and Experience. This confers substantial benefits for organization, development, maintenance and operation of the software. Most importantly, it separates the responsibilities, and the requirements for expertise, of the software engineers who develop and maintain the software, from the scientists who configure the software.

    Mathematically, the problem consists of tracing rays through the atmosphere, roughly corresponding to limb views of the instrument. The radiative transfer equation, together with variational equations with respect to the parameters of interested, are then integrated along these rays. A Newton method is then used to solve for the parameters of interest.


    Parallel Option Pricing with Fourier Space Time-stepping Method on Graphics Processing Units
    Vladimir Surkov, University of Toronto

    With the evolution of Graphics Processing Units (GPUs) into powerful and cost-efficient computing architectures, their range of application has expanded tremendously, especially in the area of computational finance. Current research in the area, however, is limited in terms of options priced and complexity of stock price models. We present algorithms, based on the Fourier Space Time-stepping (FST) method, for pricing single and multi-asset European and American options with Levy underliers on a GPU. Furthermore, the single-asset pricing algorithm is parallelized to attain greater efficiency.


    Computational challenges in time-frequency analysis
    Hongmei Zhu, YorkUniversity

    Time-varying frequencies is one of the most common features in the signals found in a wide range of applications such as biology, medicine and geophysics. Time-frequency analysis provides various techniques to represent a signal simultaneously in time and frequency, with the aim of revealing how the frequency content evolves over time. Often the time-frequency representation of an N-point time series is stored in an N-by-N matrix. It presents computational challenges as the size or the dimensionality of data increases. Therefore, the availability of efficient algorithms or computing schemes is the key to fully utilize the properties of the time-frequency analysis for practical applications. This talk addresses these computational challenges arising in biomedical applications and summarizes recent progresses in this area.


    The Design and Implementation of a Modeling Package
    Hossein ZivariPiran, University of Toronto

    Designing a modeling package with different functionalities (simulation, sensitivity analysis, parameter estimation) is a challenging and difficult process. Time consuming task of implementing efficient algorithms for doing core computations, designing a user-friendly interface, balancing generality and efficiency, and manageability of the code are just some of the issues. In this talk, based on our recent experience in developing a package for modeling delay differential equations, we discuss some techniques that can be used to overcome some of these difficulties. We present a process for incorporating existing codes to reduce the implementation time. We discuss the object-oriented paradigm as a way of having a manageable design and also a user-friendly interface.