*Case Study: Using Coconut to Optimise non-Cartesian
MRI for the Cell BE*

by

**Christopher Kumar Anand**, McMaster University and Optimal
Computational Algorithms, Inc.

Coauthors: Wolfram Kahl
We will discuss the opportunities presented by novel multi-core
architectures and the Cell BE in particular, and the challenges
in harnessing their potential. We will then describe the support
for code optimisation provided by the tool Coconut (COde CONstructing
User Tool). Finally, we will demonstrate the use of this tool
in optimising non-Cartesian magnetic resonance image reconstruction.
While Coconut makes it easy to develop complicated implementations
correctly, it does not automate the replacement of essentially
serial algorithms with parallel algorithms. This is the fun, creative
part of software development, which is highly variable from application
to application. In the MRI case, we will show how to modify the
basic reconstruction processes to tune the algorithm to the local
memory bandwidth, cache size and processor FLOPs.

--------------------------------------------------------------------------------

*Adaptive and high-order methods for American
option pricing*

by

**Duy Minh Dang, **University of Toronto

Coauthors: Christina Christara

We present an adaptive and high order method pricing American
vanilla options written on single asset with constant volatility
and interest rate. The high-order discretization in space is based
on optimal quadratic spline collocation (QSC) methods while the
timestepping is handled by the Crank-Nicolson (CN) technique.
To control the space error, the mesh redistribution strategy based
on an equidistribution principle is used. At each timestep, solution
of only one tridiagonal linear system is required. Numerical results
indicate the efficiency of the adaptive techniques and high-order
discretization methods.

--------------------------------------------------------------------------------

*Programming and Mathematical Software
for Multicore Architectures*

by

**Craig C. Douglas**, University of Wyoming and Yale University

Essentially all computers sold in 2008 will have multiple cores.
The vast majority of multicore CPUs will have identical cores,
though not all. In 2009, there will be several CPUs available
with different types of cores. In this talk, I will review multicore
architectures briefly and will discuss different ways to program
them and provide pointers to mathematical software that now exists
to help use them for scientific programming.

This talk is based on the speaker's experiences teaching a semester
course on the topic during the spring 2008 semester as a visiting
professor at Texas A&M University.

--------------------------------------------------------------------------------

*The Cell BE: architecture/programming,
vector elementary functions, and other numerical software*

by

**Robert F. Enenkel**, IBM Canada Ltd.

Coauthors: Christopher K. Anand, McMaster University and Optimal
Computational Algorithms, Inc.

My talk will give an introduction to the Cell BE heterogeneous
multiprocessor computer, including topics such as hardware architecture,
instruction set and programming considerations, floating point
characteristics, and development/simulation tools. I will then
discuss the design of SPU MASS, a high-performance SIMD/vector
elementary function library for the Cell BE SPU. If time allows,
I will also give a survey of other numerical libraries and/or
scientific computing applications on Cell BE.

**Wayne Enright**, University of Toronto

*Developing Easy to use ODE Software: The associated Cost/Reliability
Trade offs*

In the numerical solution of ODEs, it is now possible to develop
efficient techniques that compute approximate solutions that are
more convenient to interpret and use by researchers who are interested
in accurate and reliable simulations of their mathematical models.
To illustrate this we have developed a class of numerical ODE
solvers that will deliver a piecewise polynomial as the approximate
(or numerical solution) to an ODE problem.

The resulting methods are designed so that the resulting piecewise
polynomial will satisfy a perturbed ODE with an associated defect
(or residual) that is directly controlled in a consistent way.
We will discuss the quality/cost trade off that one faces when
implementing and using such methods. Numerical results on a range
of problems will be summarized for methods of orders five through
eight.

--------------------------------------------------------------------------------

*A Numerical Scheme for the Impulse Control
Formulation for Pricing Variable Annuities with a Guaranteed Minimum
Withdrawal Benefit (GMWB)*

by

**Peter Forsyth**, University of Waterloo

Coauthors: Zhuliang Chen

In this paper, we outline an impulse stochastic control formulation
for pricing variable annuities with a Guaranteed Minimum Withdrawal
Benefit (GMWB) assuming the policyholder is allowed to withdraw
funds continuously. We develop a numerical scheme for solving
the Hamilton-Jacobi-Bellman (HJB) variational inequality corresponding
to the impulse control problem. We prove the convergence of our
scheme to the viscosity solution of the continuous withdrawal
problem, provided a strong comparison result holds. The scheme
can be easily generalized to price discrete withdrawal contracts.
Numerical experiments are conducted, which show a region where
the optimal control appears to be non-unique.

--------------------------------------------------------------------------------

*Dynamically-consistent Finite-Difference
Methods for Disease Transmission Models *

by

Abba Gumel, University of Manitoba

Models for the transmission dynamics of human diseases are often
formulated in the form of systems of nonlinear differential equations.
The (typically) large size and high nonlinearity of some of these
systems make their analytical solutions almost impossible to obtain.
Consequently, robust numerical methods must be used to obtain
their approximate solutions. This talk addresses the problem and
challenges of designing discrete-time models (finite-difference
methods) that are dynamically consistent with the continuous-time
disease transmission models they approximate (in particular in
preserving some of the key properties of the continuous-time models
such as positivity, boundedness, mimicking correct bifurcations
etc.).

--------------------------------------------------------------------------------

*Portable Software Development for Multi-core
Processors, Many-core Accelerators, and Heterogeneous Architectures*

by

**Michael McCool**, RapidMind/University of Waterloo

New processor architectures, including many-core accelerators
like GPUs, multi-core CPUs, and heterogeneous architectures like
the Cell BE, provide many opportunities for improved performance.
However, programming these architectures productively in a performant
and portable way is challenging. We have developed a software
development platform that uses a common SPMD parallel programming
model for all these processor architectures. The RapidMind platform
allows developers to easily create single-source, single-threaded
programs with an existing, standard C++ compiler that can target
all the processing resources in such architectures. When compared
to tuned baseline code using the best optimizing C++ compilers
available, RapidMind-enabled code can demonstrate speedups of
over an order of magnitude on x86 dual-processor quad-core systems
and two orders of magnitude on accelerators.

--------------------------------------------------------------------------------

*Deterministic and stochastic models
in synthetic biology*

by

**David McMillen**

Dept of Chemical and Physical Sciences, University of Toronto
Mississauga

Synthetic biology is an emerging field in which we seek to design
and implement novel biological systems, or change the properties
of existing ones. Closely related to systems biology, one of the
recurring themes of synthetic biology is the use of cellular models
as an aid in design: we want to be able to gain some predictive
insight into cellular processes, rather than simply using trial-and-error.

A variety of modelling methods have been used to approach this
problem, and I will present a non-specialist introduction to the
biological issues, and a survey of some popular techniques currently
being applied in biological modelling work.

--------------------------------------------------------------------------------

*Modern Radiation Therapy: Image is Everything*

by

**Douglas Moseley**, Princess Margaret Hospital

Coauthors: Michael B. Sharpe, David A. Jaffray

Recent technical advances in Image-Guided Radiation Therapy (IGRT)
have motivated many applications in high-performance computing
. These include the addition of a kV x-ray tube and an amorphous
silicone flat-panel detector to the gantry of a medical linear
accelerator. A series of 2D kV projection images of the patient
are captured as the gantry rotates. These 2D projection images
are then reconstructed into a 3D volumetric image using a filtered
back projection technique. The algorithm, a modified Feldkamp,
is CPU intensive but highly parallelizable and hence can take
advantage of multi-process architectures. At Princess Margaret
Hospital about 25% of our patients receive daily on-line image
guidance.

In this talk an introduction to the field of radiation therapy
will be presented. The back-projection algorithm will be discussed
including an extension of the reconstruction algorithm which in
the presence of patient motion allows for the generation of 4D
images (3D in space and 1D in time). Other computing intensive
applications in radiation therapy such as image registration (rigid
and non-rigid) will be introduced.

--------------------------------------------------------------------------------

*A User-Friendly Fortran 90/95 Boundary
Value ODE Solver*

by

**Paul Muir, **Saint Mary's University

Coauthors: Larry Shampine

This talk will describe a recently developed user-friendly Fortran
90/95 software package for the numerical solution of boundary
value ordinary differential equations (BVODEs). This package,
called BVP_SOLVER, was written by the author and Larry Shampine,
of Southern Methodist University, and has evolved from an earlier
Fortran 77 package called MIKRDC, written by the author and Wayne
Enright, of the University of Toronto. This new solver takes advantage
of a number of language features of Fortran 90/95 to provide a
substantially simpler experience for the user, while at the same
time employing the underlying robust, high quality algorithms
of the original MIRKDC code. BVP_SOLVER has a simple, flexible
interface that is related to that of the Matlab BVODE solver,
bvp4c (Kierzenka and Shampine), and at the same time, it has the
fast execution speed of Fortran. BVP_SOLVER implements several
numerical algorithms that in fact represent improvements over
those employed in MIRKDC. Examples are presented to demonstrate
the capabilities of the new solver. Several current projects associated
with enhancing the capabilities of BVP_SOLVER will also be discussed.

--------------------------------------------------------------------------------

*HPC, an effective tool for three-dimensional
simulations of nonlinear internal waves*

by

**Van Thinh Nguyen**, University of Waterloo

Coauthors: K.G. Lamb

Stratified flows over topographies present challenging geophysical
fluid dynamic problems with far reaching implications for circulation
and mixing in oceans and the Great Lakes. The complexity of natural
boundaries and topographies means that a three-dimensional model
is often necessary. In addition, the hydrostatic approximation
breaks down for small-scale processes such as the steepening,
formation and breaking of non linear internal and non-hydrostatic
waves; therefore, a non-hydrostatic three-dimensional model is
required to study such phenomena. Finally, in order to simulate
small-scale processes, high resolutions are necessary; and as
a result, this requires use of High Performance Computers (HPC).

Nonlinear internal waves play an important role in redistribution
of nutrients, pollutants and sediments in oceans and lakes. In
this study, large scale simulations of nonlinear internal waves
generated by tidal and wind forcing over three-dimensional topographies
in the St. Lawrence Estuary (SLE) and Lake Erie are described.
The simulations are based on the MIT general circulation model
(MITgcm) developed by Marshall et al., (1997) with some modifications
for quasi-two layer fluid and open boundary conditions. The code
is designed using a multi-level parallel decomposition to facilitate
efficient execution on all foreseeable parallel computer architectures
and processor types (Wrappable Application Parallel Programming
Environment Resources Infrastructure).

The simulation domain has been divided into a number of subdomains
as tiles. Tiles consist of an interior region and an overlap region
with an adjacent tile. The resulting tiles are owned by an individual
processor and it is possible for a processor to own several tiles.
The owning processors perform the arithmetic operations associated
with a tile. Except for periods of communication or coordination,
each processor computes autonomously, working only with data from
the tile (or tiles) that the processor owns. The code has been
scaled on different parallel platforms (shared and distributed
memories) provided by SHARCNET (Ontario) and RQCHP (Quebec).

The numerical results are compared with data from direct observations
in the St. Lawrence Estuary (SLE) and Lake Erie. In addition,
a comparison of performances between different platforms will
be presented.

--------------------------------------------------------------------------------

*Real-Time Big Physics Applications In
National Instruments LabVIEW*

by

**Darren Schmidt**, National Instruments, et al.

Coauthors: Bryan Marker, Lothar Wenzel, Louis Giannone, Bertrand
Bauvir, Toomas Erm

Many big physics applications require an enormous amount of computational
power to solve large mathematical problems with demanding real-time
constraints. Typically, the mathematical model is fed data from
real-world acquisition channels and required to generate outputs
in less than a 1 ms for problem sizes of dimension 1,000 to 100,000
and beyond.

We report on an approach for tokamaks and extremely large telescopes
that utilizes an application development environment called LabVIEW.
This environment offers algorithm development that naturally exposes
parallel processing through it's graphical programming language
and takes advantage of multi-core processors without user intervention.
We show that the required computational power is accessible through
LabVIEW via multi-core implementations targeting a combination
of processors including CPUs, FPGAs, and GPUs deployed on standard
PCs, workstations, and real-time systems. Several benchmarks illustrate
what is achievable with such an approach.

--------------------------------------------------------------------------------

*The Numerical Mathematics Consortium
- Establishing a Verification Framework for Numerical Algorithm
Development*

by

**Darren Schmidt**, National Instruments

The Numerical Mathematics Consortium (NMC) is a group of numerics
experts from academia and industry working to address some of
the challenges faced by developers and users of mathematical software.
The goal of the group is to address artifacts of implementation
decisions made by library and system developers in the absence
of a standard or even a consensus. The lack of a published standard
means that the user of a mathematical software library must rely
on their skill, experience and wits, or on the possibly weakly
supported opinions of others, to assess the quality of such a
library or its compatibility with other libraries.

The emergence of new technologies, such as multicore processors,
further exacerbates the situation. Developers are racing to develop
libraries which can take full advantage of these technologies,
and again the lack of a reference standard will seriously pollute
the results.

For the library or system developer, the NMC Standard will provide
essential information with respect to mathematical decisions (where
the mathematical definition is not unique or does not uniquely
determine the computer library definition), calling sequences,
return types and algorithm parameters, among other characteristics.
It will also provide accuracy and consistency benchmarks and evaluations.
For the user, the successful evaluation of a numerics library
or system against the NMC Standard will provide assurance that
the library or system can be used as is and will provide results
consistent with other similarly evaluated libraries or systems.

--------------------------------------------------------------------------------

*Data analysis for the Microwave Limb
Sounder instrument on the EOS Aura Satellite*

by

**Van Snyder**, Jet Propulsion Laboratory, California Institute
of Technology

A brief discussion of the goals of the Microwave Limb Sounder
(MLS) instrument on NASA's Earth Observing System (EOS) Aura satellite,
which was launched on 15 July 2004, is followed by a discussion
of the organization and mathematical methods of the software used
for analysis of data received from that instrument.

The MLS instrument measures microwave thermal emission from the
atmosphere by scanning the earth's limb 3500 times per day. The
limb tangent altitude varies from 8 to 80 km. Roughly 600 million
measurements in five spectral bands are returned every day. From
these, estimates of the concentration of approximately 20 trace
constituents, and temperature, are formed at 70 pressure levels
on each of these scans - altogether roughly 5 million results
per day. The program is organized to process "chunks"
consisting of about 20 scans, with a five-scan overlap at both
ends of the chunk, giving 350 chunks per day. Each chunk requires
about 15 hours on a 3.6 GHz Pentium Xeon workstation. Although
one chunk can be processed on a workstation, processing all of
the data requires the attention of 350 such processors.

The software is an interpreter of a "little language, "
as described in April 2008 Software: Practice and Experience.
This confers substantial benefits for organization, development,
maintenance and operation of the software. Most importantly, it
separates the responsibilities, and the requirements for expertise,
of the software engineers who develop and maintain the software,
from the scientists who configure the software.

Mathematically, the problem consists of tracing rays through
the atmosphere, roughly corresponding to limb views of the instrument.
The radiative transfer equation, together with variational equations
with respect to the parameters of interested, are then integrated
along these rays. A Newton method is then used to solve for the
parameters of interest.

--------------------------------------------------------------------------------

**Parallel Option Pricing with Fourier
Space Time-stepping Method on Graphics Processing Units**

by

**Vladimir Surkov**, University of Toronto

With the evolution of Graphics Processing Units (GPUs) into powerful
and cost-efficient computing architectures, their range of application
has expanded tremendously, especially in the area of computational
finance. Current research in the area, however, is limited in
terms of options priced and complexity of stock price models.
We present algorithms, based on the Fourier Space Time-stepping
(FST) method, for pricing single and multi-asset European and
American options with Levy underliers on a GPU. Furthermore, the
single-asset pricing algorithm is parallelized to attain greater
efficiency.

--------------------------------------------------------------------------------

*Computational challenges in time-frequency
analysis*

by

**Hongmei Zhu**, YorkUniversity

Time-varying frequencies is one of the most common features in
the signals found in a wide range of applications such as biology,
medicine and geophysics. Time-frequency analysis provides various
techniques to represent a signal simultaneously in time and frequency,
with the aim of revealing how the frequency content evolves over
time. Often the time-frequency representation of an N-point time
series is stored in an N-by-N matrix. It presents computational
challenges as the size or the dimensionality of data increases.
Therefore, the availability of efficient algorithms or computing
schemes is the key to fully utilize the properties of the time-frequency
analysis for practical applications. This talk addresses these
computational challenges arising in biomedical applications and
summarizes recent progresses in this area.

--------------------------------------------------------------------------------

The Design and Implementation
of a Modeling Package

by

**Hossein ZivariPiran**, University of Toronto

Designing a modeling package with different functionalities (simulation,
sensitivity analysis, parameter estimation) is a challenging and
difficult process. Time consuming task of implementing efficient
algorithms for doing core computations, designing a user-friendly
interface, balancing generality and efficiency, and manageability
of the code are just some of the issues. In this talk, based on
our recent experience in developing a package for modeling delay
differential equations, we discuss some techniques that can be
used to overcome some of these difficulties. We present a process
for incorporating existing codes to reduce the implementation
time. We discuss the object-oriented paradigm as a way of having
a manageable design and also a user-friendly interface.