## SCIENTIFIC PROGRAMS AND ACTIVITIES

August 23, 2014
THE FIELDS INSTITUTE FOR RESEARCH IN MATHEMATICAL SCIENCES
 August 12-15, 2013 22nd International Workshop on Matrices and Statistics Location : Bahen Centre, 40 St. George St. , Room 1180 (map) CONTRIBUTED TALKS ABSTRACTS
 REGISTRATION **On-site Aug. 12-15** After July 1 Fees: $300, Students and Postdocs$200 Registration fees include conference material, morning refreshments, coffee breaks, and 2 catered lunches Tickets to the Workshop Dinner on August 14, $60 for registrants,$75 for guests Abstract Submissions Housing in Toronto History of the IWMS IWMS series Journal of Statistical computation and simulation (JSCS). Related Conferences Map to Bahen Centre

Back to main index

 CONFIRMED Akbar Azam Random operator equations in probabilistic functional analysis. Philip V. Bertand A matrix method for solving missing data problems Nino Demetrashvili Confidence intervals for intraclass correlation coefficients in nonlinear mixed effects models Sami Helle Distance Optimal Designs for Linear Models Xiaomi Hu Order restricted multivariate two-sample problems Sayantee Jana Parameter estimation and moderated trace test for the growth curve model for high-dimensional longitudinal data Eero Liski Averaging orthogonal projectors Erkki P. Liski On subspace distance and the mean of subspaces Augustyn Markiewicz Optimal neighbor designs under several interference models Mika Mattila Estimating the eigenvalues of meet and join matrices Joseph Nzabanita Multivariate linear models with Kronecker product and linear structures on the covariance matrices Haruhiko Ogasawara Bias adjustment minimizing the asymptotic mean square error Yuichiro Ogawa A cut-off point for diagonal discriminant analysis in high dimension Simo Puntanen Formulas Useful for Linear Regression Analysis and Related Matrix Theory Simo Puntanen Flashes from the Second Tampere Conference in Statistics in 1987 and the first IWMS in 1990 David Titley-Peloquin Stochastic conditioning of systems of equations Julia Volaufova Two-stage approximate testing in nonlinear mixed models Hans Joachim Werner In the Year of Statistics: C. R. Rao's IPM Method - Revisited and Extended AWAITING CONFIRMATION Haftom T. Abebe, Bayesian design for dichotomous repeated measurements with serial correlation Dila Ram Bhandari Statistical Foresting: Analytical Tools Anis Iranmanesh Generalized Matrix T Distribution Through Generalized Multivariate Gamma Dristribution Zohreh Javanshiri Finding the information matrix for exp-uniform distribution base on complete and type-II censored data Syeda Rabab Mudakkar Mudakkar Rademacher inequalities for Operators Sathish Pichika Integration of miRNA and mRNA Expressions: An Application of Sparse Canonical Correlation Analysis (SCCA)   WITHDRAWN Arezou Habibi Rad Inference based on unified hybrid censored data for Weibull distribution

Haftom T. Abebe, University of maastricht Netherlands
Bayesian design for dichotomous repeated measurements with serial correlation
Coauthors: Frans E. S. Tan, Gerard J. P. Van Breukelen and Martijn P. F. Berger

In medicine and health sciences a binary outcome is often measured repeatedly to study its change over time. A well-known problem for such studies is that designs that have an optimal efficiency for some parameter values may not be efficient at all for other parameter values. We propose Bayesian designs which formally account for the uncertainty in the parameter values for a mixed logistic model which allows linear or quadratic changes. The Bayesian D-optimal allocations of time points of measurement are computed for different priors, covariance structures and different values of autocorrelation. Since the costs per subject may be quite different from the costs per measurement, a subject-to-measurement cost ratio is taken into account when designs are compared. The results show that the optimal number of time points increases with the cost ratio, and that neither the optimal number nor the optimal allocation of time points appears to depend strongly on the prior, covariance structure, or on the amount of autocorrelation. It also appears that for cost ratio up to five, four equidistant time points and for larger cost ratios six equidistant time points are highly efficient. Moreover, it seems to be more crucial to choose the number of time points rather than to allocate the time points as close as possible to the Bayesian optimal design. Our results are compared with the actual design of a study of respiratory infection in Indonesian preschool children and with the design of a smoking prevention study in primary school in the Netherlands.

Akbar Azam, COMSATS Institute of Information Technology
Random operator equations in probabilistic functional analysis

Random operator theory is needed for the study of various classes of random operator equations in probabilistic functional analysis. During the last three decades several results regarding random fixed points of various types of random operators have been established and a number of their applications have been obtained in Mathematical Statistics. In fact, random fixed point theorems are stochastic generalizations of deterministic/classical fixed point theorems and have important applications in random operator equations, random differential equations and differential inclusions .In the present talk we derive common random fixed point theorems for multivalued random operators satisfying a contractive condition.

Philip V. Bertrand
A matrix method for solving missing data problems

Given a large data set with numbers missing at random the fitting of multivariate distributions to it is difficult. It is shown to be possible to fit an appropriately specified multivariate distribution with parameters unknown using rather complex matrix methods. Examples for the multivariate normal distribution and for the Cox proportional hazards model are described.

Dila Ram Bhandari, Tribhuvan University
Statistical Foresting: Analytical Tools

Statistics plays a vital role in every fields of human activity. The statistical tools like Index number, correlation, time series analysis, regression analysis, hypothesis testing, and multivariate analysis help to analysis data and predict about future. Forecasting is the process of making statements about events whose actual outcomes have not yet been observed. Statistical forecasting concentrates on using the past to predict the future by identifying trends, patterns and business and economic drive within the data to develop a forecast with tools as regression analysis, time-series analysis and many more. Estimating the likelihood of an event taking place in the future, based on available data. Statistics is a set of techniques that are used in collecting, analyzing, presenting, and interpreting data. Statistical methods are used in a wide variety of occupations and help people identify, study, and solve many complex problems. Statistics is also widely used in the business and economic world. This forecast is referred to as a statistical forecast because it uses mathematical formulas to identify the patterns and trends while testing the results for mathematical reasonableness and confidence. In many Forecasting Processes, statistical forecasting forms the baseline that is adjusted throughout the process. Risk and uncertainty are central to forecasting and prediction; it is generally considered good practice to indicate the degree of uncertainty attaching to forecasts.

Nino Demetrashvili, University of Groningen
Confidence intervals for intraclass correlation coefficients in nonlinear mixed effects models
Coauthors: Prof. Edwin van den Heuvel

In our previous work we proposed two generic approaches for constructing confidence intervals on intraclass correlation coefficients (ICCs) for variance components models. The first approach uses Satterthwhaite’s approximation and the F-distribution. The second approach uses the first and second moments of the ICC estimate in combination with a Beta distribution. The variance components were etimated with restricted maximum likelihood. The coverage probability of the confidence intervals demonstrated accurate results for the Beta-approach on two balanced three-way variance components models, in particular for settings with small sample sizes.

In our previous work we focused on linear models, but here we investigate the performance of the Beta-approach for confidence intervals of the ICCs in nonlinear mixed effects models. The case study is a meta-analysis on anti-psychotic medications using Michaelis-Menten curves for dose-response relationships. In non-linear mixed models, restricted maximum likelihood estimation is not well defined and different approaches are present for variance components estimation. We present the results of a simulation study that would compare different estimation methods. The main focus is on small sample settings, which were driven by our case study.

Sami Helle, Tampere University of Technology and University of Tampere
Distance Optimal Designs for Linear Models
Coauthors: Erkki Liski (University of Tampere)

Properties of the most familiar optimality criteria, for example A-, D- and E-optimality, are well known, but the distance stochastic optimality criterion has not drawn as much attention to date. There exists an extensive literature on the characterization of optimal designs under both discrete and continuous settings using the most familiar optimality criteria. For references see Pukelsheim (1993) and Liski et al. (2002), for example. Though the distance stochastic criterion (DS-criterion) was put forward over forty years ago in Sinha (1970), it has attracted attention only about ten years ago (see, for example, Liski et al.1999, 2001; Zaigraev 2002, 2003, 2006).

In this paper we investigate properties of the DS-optimal designs for the linear model under normally distributed errors. The particular attention is paid to the mixture model.

References:
Liski, E. P., Mandal, N. K., Shah, K. R. and Sinha, B. K (2002), Topics in Optimal Design, Springer,New York.
Liski, E. P. and A. Luoma and Zaigraev, A. (1999), Distance optimality design criterion in linear models, Metrika, 49, 193-211.
Liski, E. P. and Zaigraev A. (2001), A stochastic characterization of Loewner optimality design criterion in inear models, Metrika, 53, 207-222.
Pukelsheim, F. (1993), Optimal Design of Experiments, Wiley, New York.
Sinha B. K. (1970), On the optimality of some design, Calcutta Statistical Association Bulletin, 20, 1-20.
Zaigraev, A (2002), Shape optimal design criterion in linear models, Metrika, 56, 259-273.
Zaigraev (2003), Integral stochastic optimal design criteria in linear models, Metrika, 57, 287-301.
Zaigraev, A. (2006), On DS-optimal design matrices with restrictions on rows or columns, Metrika, 64, 181-189.

Xiaomi Hu, Wichita State University
Order restricted multivariate two-sample problems

A p by 2 matrix is order restricted if its two columns are linked by an order, a reflexive and transitive relation of vectors. The projection onto the collection of all such restricted matrices plays a vital role in order restricted multivariate two-sample estimation and testing. In this talk we define a general vector order relation, establish the projection formula, and explore its applications in Statistics.

Generalized Matrix T Distribution Through Generalized Multivariate Gamma Dristribution
Coauthors: Mohammad Arashi (Department of Mathematics, Shahrood University of Technology, Shahrood, Iran)

In this paper, by conditioning the covariance structure of matrix variate normal distribution the construction of a generalized matrix t-type family is considered, thus providing a new perspective of this family. In this regard, a generalized multivariate gamma distribution including zonal polynomials is introduced. Some important statistical characteristics are given. An attempt is made to reconsider Bayes analysis of the column covariance matrix of the underlying population model. Thus an application of the proposed result is given in the Bayesian context of the multivariate linear regression models.

Sayantee Jana, McMaster University
Parameter estimation and moderated trace test for the growth curve model for high-dimensional longitudinal data
Coauthors: Narayanaswamy Balakrishnan (McMaster University) Dietrich von Rosen (Swedish university of agricultural sciences) Jemila S Hamid (McMaster University)

Growth curve models (GCM) are an essential tool for application in longitudinal data. The traditional tests in Growth Curve Models collapse in high-dimensional setup (n<p). So in this study a moderated test has been proposed for testing GCM in high-dimensional scenarios. Two types of moderations were considered: the Moore-Penrose generalized inverse and Empirical Bayes’ estimator. Extensive simulations demonstrated the performance of the moderated test, and the results were compared with the original trace test. Distance measures were used for comparison purposes because the parameters are matrices. Moderated MLE and BLUE are provided for the parameter matrix and the variance-covariance matrix and their performances were assessed using bias and MSE which were compared to the bias and MSE of the existing MLEs in the non-high dimensional setup. The approach was illustrated using time-course microarray data from a Lung Cancer study.

Zohreh Javanshiri
Finding the information matrix for exp-uniform distribution base on complete and type-II censored data

The Fisher information matrix summarizes the amount of information in the data relative to the quantities of interest. It has applications in finding the variance of estimators, as well as in the asymptotic behaviour of maximum likelihood estimator. In this paper, a new distribution called the exp-uniform distribution is proposed. The regularity conditions don't hold for the exp-uniform distribution so we obtain the information matrix according to shao [2003], for complete and type-II censored data.We also provide the estimator of parameters using the maximum likelihood method. For illustrative purpose, real data set is analysed.

Eero Liski, University of Tampere
Averaging orthogonal projectors
Coauthors: Klaus Nordhausen (University of Tampere), Hannu Oja (University of Turku) and Anne Ruiz-Gazen (Toulouse School of Economics).

Dimension reduction (DR) plays an important role in high dimensional data analysis. Often the interest is on regression, where the goal is to infer about the conditional distribution of the response y given the p-variate explanatory vector x. One then wishes to find an orthogonal projector P in such a way that y is independent of x given Px. Such celebrated DR methods as sliced inverse regression (SIR) and sliced average variance estimate (SAVE) are adequate for finding only certain types of relationships between y and x. Hence, we combine individual DR methods via their corresponding orthogonal projectors and strive to provide the best qualities of each individual DR method. This approach finds a reduced number of uncorrelated variables and circumvents the curse of dimensionality.

Erkki P. Liski, University of Tampere
On subspace distance and the mean of subspaces
Coauthors: Eero Liski

In multivariate problems, it is customary to use dimension reduction (DR) techniques which share the characteristic that the original data is projected onto a lower dimensional subspace. Principal component analysis is a typical example. For many applications it is the subspace, not its particular representation that is important. Different DR methods like sliced inverse regression and projection pursuit capture different structures in data. The choice of the distance measure between subspaces is crucial when comparing the performance of various DR techniques. We address the question: How to construct a subspace distance to establish relationships between two a more subspaces with possible different dimensions? We present properties of alternative subspace distances and formulate the problem of finding the mean subspace as the computation of a matrix mean. This calls for considering an acceptable definition for a mean of positive semidefinite matrices. In addition to DR, subspace methods provide a useful approach to the least squares model averaging in linear regression and to certain fields in pattern recognition, for example.

Augustyn Markiewicz, Poznán University of Life Sciences
Optimal neighbor designs under several interference models

The concept of neighbor designs was introduced and defined by Rees (1967) along with giving some methods of their construction. Henceforth many methods of construction of neighbor designs as well as of their generalizations are available in the literature. However there are only few results on their optimality. Therefore the aim of the talk is to give an overview of study on this problem. It will include some recent results on optimality of specified neighbor designs under various linear models. The optimality will be studied with respect to the estimation of a given subvector of parameters.

Mika Mattila, University of Tampere
Estimating the eigenvalues of meet and join matrices
Coauthors: Pentti Haukkanen

Let (P, ≤ ) be a lattice and f be a real-valued function on P. In addition, let S={x1, ..., xn} be a subset of P which elements are distinct and arranged so that xi ≤ xj⇒ i ≤ j. The n×n matrix having f(xi∧xj) as its ij element is the meet matrix of the set S with respect to f and is denoted by (S)f. Similarly, the n×n matrix having f(xi∨xj) as its ij element is the join matrix of the set S with respect to f and is denoted by [S]f. In case when (P, ≤ )=(Z+, |) the matrices (S)f and [S]f are referred to as the GCD and LCM matrices of the set S with respect to f.

Despite that meet-related matrices have been studied a lot over the years, not much is known about their eigenvalues. Most of the existing results concern special cases such as GCD and LCM matrices. In fact, currently there is only one paper that considers the eigenvalues of meet and join matrices (see [2]).

In this presentation we generalize Hong's and Enoch Lee's [1] method and derive upper bounds for the eigenvalues of certain meet and join matrices (S)f and [S]f. As examples we consider the so called power GCD, power GCUD, reciprocal power LCM and MIN matrices.

References:

[1] S. Hong and K. S. Enoch Lee, Asymptotic behavior of eigenvalues of reciprocal power LCM matrices, Glasg. Math. J. 50 (2008) 163-174.

[2] P. Ilmonen, P. Haukkanen and J. K. Merikoski, On eigenvalues of meet and join matrices associated with incidence functions, Linear Algebra Appl. 429 (2008) 859-874.

Syeda Rabab Mudakkar, Lahore School of Economics
Coauthors: Sergey Utev

This work is motivated by optimal bounds in Rosenthal and Khintchine type moment inequalities. We establish several comparison results for commutative and non-commutative random variables including random matrices and freely independent variables.

Joseph Nzabanita, National University of Rwanda, Linköping University
Multivariate linear models with Kronecker product and linear structures on the covariance matrices
Coauthors: Dietrich von Rosen (Swedish University of Agricultural Sciences) Martin Singull (Linköping University)

Models based on normally distributed random matrix are studied. For these models, the dispersion matrix has the so called Kronecker product structure and they can be used for example to model data with spatio-temporal relationships. Our aim is to estimate the parameters of the model when, in addition, one covariance matrix is assumed to be linearly structured and the mean has a bilinear structure. On the basis of n independent observations from a matrix normal distribution, estimating equations in a flip-flop relation are established and numerical examples are given.

Haruhiko Ogasawara, Otaru University of Commerce
Bias adjustment minimizing the asymptotic mean square error

A method of bias adjustment which minimizes the asymptotic mean square error is presented for an estimator typically given by maximum likelihood. Generally, this adjustment includes unknown population values. However, in some examples, the adjustment does not include population values. In the case of a logit, a reasonable fixed known value for the adjustment is found, which gives the asymptotic mean square error smaller than those of the asymptotically unbiased estimator and the maximum likelihood estimator. The weighted-score method, which yields directly the estimator with the minimized asymptotic mean square error, is also given.

Yuichiro Ogawa
, Tokyo University of Science
A cut-off point for diagonal discriminant analysis in high dimension
Co-authors: Takayuki Yamada (Nihon University) and Takashi Seo (Tokyo University of Science)

We consider the discriminant analysis of two groups when the number of observations is larger than the total sample size. Diagonal discriminant rule (DDR) is known as a popular rule for the high-dimensional discrimination. The DDR treated is based on Fisher's linear discriminant rule (W-rule) and the likelihood ratio rule (Z-rule). We propose a cut-off point such that the limiting error rate takes the minimum value under the high-dimensional framework A1: $N_1,N_2,p \to \infty, N_1/p \to c_1 \in (0,\infty), N_2/p \to c_2 \in (0,\infty),N_1/N_2 \to c \in (0,\infty)$. By Monte Carlo simulation, we confirmed that our proposal cut-off point takes lower error rate compared to the zero cut-off point.

Sathish Pichika, McMaster University
Integration of miRNA and mRNA Expressions: An Application of Sparse Canonical Correlation Analysis (SCCA)
Coauthors: Joseph Beyene

Canonical Correlation Analysis (CCA) is a multivariate statistical method that can be used to find linear relationship between two datasets. In high-dimensional data where the number of variables in each dataset is very large and sample size relatively small, findings will lack robustness and biological interpretation. Modern statistical and computational approaches are emerging to deal with this challenge and one such method is Sparse CCA (SCCA), where some of the variables are forced to zero leaving with a sparse set of variable to interpret. SCCA finds linear combinations of two datasets that include only small subsets of variables with maximal correlation. We illustrated the methods using real genomic datasets. Furthermore, we believe that integrating genomic data allows us to understand the fundamental biological processes and may help in elucidating causes of complex diseases.

Simo Puntanen, University of Tampere
Formulas Useful for Linear Regression Analysis and Related Matrix Theory
Coauthors: George P. H. Styan (McGill University)

Even though a huge amount of the formulas related to linear models is available in the statistical literature, it is not always so easy to catch them when needed. The purpose of this collection is to put together a good bunch of helpful rules-within a limited number of pages, however. They all exist in literature but are pretty much scattered. The first version (technical report) of the Formulas appeared in 1996 (54 pages) and the fourth one in 2008. Since those days, the authors have never left home without the Formulas.

This book is not a regular textbook-this is supporting material for courses given in linear regression (and also in multivariate statistical analysis); such courses are extremely common in universities providing teaching in quantitative statistical analysis.

Reference
Simo Puntanen, George P. H. Styan & Jarkko Isotalo (2013). Formulas Useful for Linear Regression Analysis and Related Matrix Theory: It's Only Formulas But We Like Them. Springer.

Simo Puntanen, University of Tampere
Flashes from the Second Tampere Conference in Statistics in 1987 and the first IWMS in 1990
Coauthors: George P. H. Styan (McGill University) Reijo Sund (National Institute for Health and Welfare, Helsinki) Kimmo Vehkalahti (University of Helsinki)

In this talk we show video clips from the invited talks given in

---The Second International Tampere Conference in Statistics, 1--4 June 1987.
---The first IWMS: International Workshop on Linear Models, Experimental Designs, and Related Matrix Theory, Tampere, 6--8 August 1990.
http://people.uta.fi/~simo.puntanen/Program-1987-conference-Tampere.pdf
http://people.uta.fi/~simo.puntanen/Proceedings-87-front-matter.pdf
http://people.uta.fi/~simo.puntanen/Tampere-Conference-87-poster.jpg
http://www.sis.uta.fi/tilasto/iwms/program-IWMS-1990-Tampere.pdf
With the assistance of Jarmo Niemela, our aim is to to provide free access to the recorded talks by October 2013.

David Titley-Peloquin, CERFACS
Stochastic conditioning of systems of equations
Coauthors: Serge Gratton (CERFACS and ENSEEIHT, Toulouse, France)

Given nonsingular A ∈ Rn×n and b ∈ Rn, how sensitive is x=A-1b to perturbations in the data? This is a fundamental and well-studied question in numerical linear algebra. Bounds on ∥A-1b-(A+E)-1b∥ can be stated using the condition number of the mapping (A, b)→ A-1b, provided ∥A-1E∥ < 1. If ∥A-1E∥ ≥ 1 nothing can be said, as A+E might be singular. These well-known results answer the question: how sensitive is x to perturbations in the worst case? However, they say nothing of the typical or average case sensitivity.

We consider the sensitivity of x to random noise E. Specifically, we are interested in properties of the random variable

 m(A, b, E) = ∥ A-1b - (A+E)-1b ∥ = ∥(A+E)-1E x ∥,

where vec(E) ~ (0, S) follows various distributions. We attempt to quantify the following:

• How descriptive on average is the known worst-case analysis for small perturbations?
• Can anything be said about typical sensitivity of x even for large perturbations?

We provide asymptotic results for ∥S∥→ 0 as well as bounds that hold for large ∥S∥. We extend some of our results to structured perturbations as well as to the full rank linear least squares problem

Julia Volaufova, LSUHSC School of Public Health
Two-stage approximate testing in nonlinear mixed models
Coauthors: Jeff Burton, Pennington Biomedical Research Center, LSU, Baton Rouge, USA

We investigate here approximate small sample F-tests about fixed effects parameters in nonlinear mixed models via two possible approaches to estimation of population parameters. One is based on approximating the marginal likelihood using Gaussian quadrature. The Wald-type test statistic in this case uses the estimation covariance matrix based on the approximate Fisher information matrix. The adjustment coefficient and denominator degrees of freedom depend on the total sample size and the number of population fixed-effects parameters.

The second is the two-stage approach for the case when the number of observations per sampling unit is large enough. The approximate F-test is developed based on a normal approximation to the distribution of nonlinear least squares estimates of subject-specific individual parameters, which constitute the response for the second stage. The second-stage model results in a mixed model with covariance matrix dependent on the unknown variance components as well as on the fixed effects population parameters. We consider this two-stage approach and suggest the use of an approximate F-test based on approximate maximum likelihood estimates of all model parameters. Here we focus on comparing the performance of approximate tests under the null hypothesis, especially accuracy of p-values, via simulation studies conducted for two types of pharmacokinetic models.

Hans Joachim Werner, University of Bonn
In the Year of Statistics: C. R. Rao's IPM Method - Revisited and Extended

In the framework of the general (possibly singular) linear statistical model, we particularly discuss an extended IPM-type method which is a unified method not only for obtaining estimations but also for obtaining predictions and estimated prediction error dispersions.

ABSTRACT WITHDRAWN