# SCIENTIFIC PROGRAMS AND ACTIVITIES

January 18, 2017

## June 9 to 11, 2011 International Workshop on Perspectives on High-dimensional Data Analysis Fields Institute, Toronto (Map) Jointly held as a satellite meeting with the 39th Annual Meeting of the

Organizing Committee
S. Ejaz Ahmed (chair), University of Windsor
Peter X. K. Song, University of Michigan
Mu Zhu, University of Waterloo

### Keynote Speakers

Rudy Beran, University of California-Davis

Estimating Many Means: A Mosaic of Recent Methodologies
A fundamental data structure is the k-way layout of observations, complete or incomplete, balanced or unbalanced. The cells of the layout are indexed by all k-fold combinations of the levels of the k covariates (or factors). Replication of observations within cells may be rare or nonexistent. Observations may be available for only a subset of the cells. The problem is to estimate the mean observation, or mean potential observable, for each cell in the k-way layout. Equivalently, the problem is to estimate an unknown regression function that depends on k covariates.

This talk unifies a mosaic of recent methodologies for estimating means in the general k-way layout or k-covariate regression problem. Included are penalized least squares with multiple quadratic penalties, associated Bayes estimators, associated submodel fits, multiple Stein shrinkage, and functional data analysis as a limit scenario. The focus is on the choice of tuning parameters to minimize estimated quadratic risk under a minimally restrictive data model; and on the asymptotic risk of the chosen estimator as the number of observed cells in the k-way layout tends to infinity.

========================
Jiahua Chen, University of British Columbia

Advances in EM-test for Finite Mixture Models
Making valid and effective inferences for finite mixture models has known to be technically challenging. Due to the non-regularity, the likelihood ratio test was found to diverge to infinite if the parameter space is not artificially confined to a compact space. Even under compact assumption, the limiting distribution is often a function of the supermum of some Gaussian processes. Such results are of theoretical interest but not useful in applications. Recently, many new tests have been proposed to address this problem. The EM-test has been found superior in many respects. For many classes of finite mixture models, we have tailor designed EM-tests that have easy to use limiting distributions. The simulation indicates that the limiting distributions have good precision at approximating the finite sample distributions in the examples investigated. A general procedure for choosing the tuning parameter has also been developed.

========================
Xihong Lin, Harvard University

Hypothesis testing and variable selection for Studying Rare Variants in Sequencing Association Studies
Sequencing studies are increasingly being conducted to identify rare variants associated with complex traits. The limited power of classical single marker association analysis for rare variants poses a central challenge in such studies. We propose the sequence kernel association test (SKAT), a supervised, flexible, computationally efficient regression method to test for association between genetic variants (common and rare) in a region and a continuous or dichotomous trait, while easily adjusting for covariates. As a score-based variance component test, SKAT can quickly calculate p-values analytically by fitting the null model containing only the covariates, and so can easily be applied to genome-wide data. Using SKAT to analyze a genome-wide sequencing study of 1000 individuals, by segmenting the whole genome into 30kb regions, requires only 7 hours on a laptop. Through analysis of simulated data across a wide range of practical scenarios and triglyceride data from the Dallas Heart Study, we show that SKAT can substantially outperform several alternative rare-variant association tests. We also provide analytic power and sample size calculations to help design candidate gene, whole exome, and whole genome sequence association studies. We also discuss variable selection methods to select causal variants.

### Invited Speakers

Ejaz Ahmed, University of Windsor

System/Machine Bias versus Human Bias: Generalized Linear Models
Penalized and shrinkage regression have been widely used in high-dimensional data analysis. Much of recent work has been done on the study of penalized least square methods in linear models. In this talk, I consider estimation in generalized linear models when there are many potential predictor variables and some of them may not have influence on the response of interest. In the context of two competing models where one model includes all predictors and the other restricts variable coefficients to a candidate linear subspace based on prior knowledge, we investigate the relative performances of absolute penalty estimator (APE) and shrinkage estimators in the direction of the subspace. We develop large asymptotic analysis for the shrinkage estimators. The asymptotics and a Monte Carlo simulation study show that the shrinkage estimator performs better than benchmark estimators. Further, it performs better than the APE when the dimension of the restricted parameter space is large. The estimation strategies considered in this talk are also applied on a real life data set for illustrative purpose.

========================
Pierre Alquier, Université Paris 7 and CREST

Bayesian estimators in high dimension: PAC bounds and Monte Carlo methods
Coauthors: Karim Lounici (Georgia Institute of Technology) Gérard Biau (Université Paris 6)
The problem of sparse estimation in high dimension received a lot of attention in the last ten years. However, to find an estimator with both satisfying statistical and computationnal properties is still an open problem. For example the LASSO can be efficiently computed but statistical properties requires strong assumption on the observations. On the other hand, BIC does not require such hypothesis but can not be efficiently computed in very high dimension. We propose here the so-called PAC-Bayesian method (McAllester 1998, Catoni 2004, Dalalyan and Tsybakov 2008) as an alternative approach. We build a Bayesian estimator that satisfies a tight PAC bound, and compute it using reversible jump Markov Chain Monte Carlo methods. A first version, proposed in a joint work with Karim Lounici, deals with the linear regression problem while the work with Gérard Biau extends these results to the single index model.

========================
Shojaeddin Chenouri, University of Waterloo
Coauthors: Sam Behseta (California State University, Fullerton)

Comparison of Two Populations of Curves with an Application in Neuronal Data Analysis
Often in neurophysiological studies, scientists are interested in testing hypotheses regarding the equality of the overall intensity functions of a group of neurons when recorded under two different experimental conditions. In this talk, we consider such a hypothesis testing problem. We propose two test statistics: a parametric test based on the Hotelling's $T^2$ statistic, as well as a nonparametric one based on the spatial signed-rank test statistic of M\"{o}tt\"{o}nen and Oja (1995). We implement these tests on smooth curves obtained via fitting Bayesian Adaptive Regression Splines (BARS) to the intensity functions of neuronal Peri-Stimulus Time Histograms (PSTH).
Through simulation, we show that the powers of our proposed tests are extremely high even when the number of sampled neurons, and the number of trials per neuron are small. Finally, we apply our methods on a group of motor cortex neurons recorded during a reaching task.

========================
Coauthors: Fan Yang, Kam Tsui

Biomedical large scale inference
I will describe methods used by population and medical geneticists to analyse associations between disease and genetic markers. These methods are able to handle data with hundred of thousands of variables by using dual principal component analysis. I will compare these methods to frequentist and Bayesian methods from the field of statistics.
This is joint work with Fan Yang and Kam Tsui

========================
Yang Feng, Columbia University
Coauthors: Tengfei Li, Wen Yu, Zhiliang Ying, Hong Zhang

Loss Adaptive Modified Penalty in Variable Selection
For variable selection, balancing sparsity and stability is a very important task. In this work, we propose the Loss Adaptive Modified Penalty (LAMP) where the penalty function is adaptively changed with the type of the loss function. For generalized linear models, we provide a unified form of the penalty corresponding to the specific exponential family. We show that LAMP can have asymptotic stability while achieving oracle properties. In addition, LAMP could be seen as a special functional of a conjugate prior. An efficient coordinate-descent algorithm is proposed and a balancing method is introduced. Simulation results show LAMP has competitive performance comparing with several well-known penalties.

========================
D. A. S. Fraser, University of Toronto

High-Dimensional: The Barrier and Bayes and Bias
We all aspire to breach the barrier and we do; and yet it always reforms as more formidable. In the context of a statistical model and data two familiar approaches involve; slicing which uses only a data-slice of the model, namely the likelihood function perhaps with a calibrating weight function or prior; and bridging which uses derivatives at infinity to cantilever back over the barrier to first, second, and third order. Both have had remarkable successes and both involve risks that can be serious.

We all have had confrontations with the boundary and I'll start with comment on my first impact. The slicing I refer to is the use of the data slice, the likelihood function, as the sole or primary model summary. This can be examined in units data-standardized and free from model curvature, and the related gradient of the log-prior then gives a primary calibration of the prior; the initiative in this direction is due to Welch and Peers (1963) but its prescience was largely overlooked. The bridging is the Taylor expansion about infinity with analysis from asymptotics. From these we obtain an order of magnitude calibration of the effect of a prior on the basic slice information; this leads to the direction and the magnitude of the bias that derives from the use of a prior to do a statistical analysis.

========================
Xin Gao, York University
Coauthors: Peter Song, Yuehua Wu

Model selection for high-dimensional data with applications in feature selection and network building
For high-dimensional data set with complicated dependency structures, the full likelihood approach often leads to intractable computational complexity. This imposes difficulty on model selection as most of the traditionally used information criteria require the evaluation of the full likelihood. We propose a composite likelihood version of the Bayesian information criterion (BIC) and establish its consistency property for the selection of the true underlying marginal model. Under some mild regularity conditions, the proposed BIC is shown to be selection consistent, where the number of potential model parameters is allowed to increase to infinity at a certain rate of the sample size. In this talk, we will also discuss the result that using a modified Bayesian information criterion (BIC) to select the tuning parameter in penalized likelihood estimation of Gaussian graphical model can lead to consistent network model selection even when $P$ increases with $N,$ as long as all the network edges are contained in a bounded subset.

========================
Xiaoli Gao, Oakland University,

The fused lasso penalty is commonly used in signal processing
when the hidden true signals are sparse and blocky. The $\ell_1$ loss has some robust properties when the additional noises are contaminated by outliers. In this manuscript, we study the asymptotic properties of an LAD-fused-lasso model used as a signal approximation (LAD-FLSA). We first investigate the estimation consistency properties of an LAD-FLSA estimator. Then we provide some conditions under which an LAD-FLSA estimator can be both block selection consistent and sign consistent. We also provide an unbiased estimate for the generalized degrees of freedom (GDF) of the LAD-FLSA modeling procedure for any given tuning parameters. The effect of the unbiased estimate is demonstrated using simulation studies.

========================
Yulia Gel, University of Waterloo
Coauthors: Peter Bickel, University of California, Berkeley

Banded regularization of autocovariance matrices in application to parameter estimation and forecasting of time series
This talk addresses a "large p-small n" problem in a time series framework and considers properties of banded regularization of an empirical autocovariance matrix of a time series process. Utilizing the banded autocovariance matrix enables us to fit a much longer model to the observed data than typically suggested by AIC, while controlling how many parameters are to be estimated precisely and the level of accuracy. We present results on asymptotic consistency of banded autocovariance matrices under the Frobenius norm and provide a theoretical justi cation on optimal band selection using cross-validation. Remarkably, the cross-validation loss function for banded prediction is related to the conditional mean square prediction error (MSPE) and, thus, may be viewed as an alternative model selection criterion. The proposed procedure is illustrated by simulations and application to predicting sea surface temperature (SST) index in the Nino 3.4 region.

========================
Jiashun Jin, Carnegie Mellon University

Coauthors: Pengsheng Ji

UPS delivers optimal phase diagram in high dimensional variable selection
We consider a linear regression model where both $p$ and $n$ are large but $p > n$. The vector of coefficients is unknown but is sparse in the sense that only a small proportion of its coordinates is nonzero, and we are interested in identifying these nonzero ones. We propose a two-stage variable selection procedure which we call the {\it UPS}. This is a Screen and Clean procedure, in which we screen with the Univariate thresholding, and clean with the Penalized MLE.
In many situations, the UPS possesses two important properties: Sure Screening and Separable After Screening (SAS). These properties enable us to reduce the original regression problem to many small-size regression problems that can be fitted separately. As a result, the UPS is effective both in theory and in computation. The lasso and the subset selection are well-known approaches to variable selection. However, somewhat surprisingly, there are regions where neither the lasso nor the subset selection is rate optimal, even for very simple design matrix. The lasso is non-optimal because it is too loose in filtering out fake signals (i.e. noise that is highly correlated with a signal), and the subset selection is non optimal because it tends to kill one or more signals in correlated pairs, triplets, etc..

========================
Timothy D. Johnson, University of Michigan, Department of Biostatistics

Computational Speedup in Spatial Bayesian Image Modeling via GPU Computing
Spatial modeling is a computationally complex endeavor due to the spatial correlation structure in the data that must be taken into account in the modeling. This endeavor is even more computationally complex for 3D data/images---curse of dimensionality---and within the Bayesian framework due to posterior distributions that are not analytically tractable and thus must be approximated via MCMC simulation. For point reference data, dimension reduction techniques, such as Gaussian predictive process models, have alleviated some of the computational burden, however, for image data and point pattern data, these dimension reduction techniques may not be applicable. Two examples are a population level fMRI hierarchical model where image correlation is accounted for in the weights of a finite mixture model and a log-Gaussian Cox process model of lesion location in patients with Multiple Sclerosis. Both of these models are extremely computationally intense due to the complex nature of the likelihoods and the size of the 3D images. However, both likelihoods are amenable to parallelization. Although the MCMC simulation cannot be parallelized, by small, rather straightforward changes to the code and porting the likelihood computation to a graphical processing unit (GPU), I have achieved over 2 orders of magnitude increase in computational efficiency in these two problems.

========================
Abbas Khalili, McGill University
Coauthors: Shili Lin; Dept. of Statistics, The Ohio State University

Regularization in finite mixture of regression models with diverging number of parameters
Feature (variable) selection has become a fundamentally important problem in recent statistical
literature. Often, in applications many variables are introduced to reduce possible modeling biases.
The number of introduced variables thus depends on the sample size, which reflects the estimability
of the parametric model. In this paper, we consider the problem of feature selection in finite mixture of
regression models when the number of parameters in the model can increase with the sample size.
We propose a penalized likelihood approach for feature selection in these models. Under certain
regularity conditions, our approach leads to consistent variable selection. We carry out a simulation
study to evaluate the performance of the proposed approach under controlled settings. A real data on
Parkinsons disease is also analyzed. The data concerns whether dysphonic features extracted from
the patients' speech signals recorded at home can be used as surrogates to study PD severity and
progression. Our analysis of the PD data yields interpretable results that can be of important clinical values.
The stratification of dysphonic features for patients with mild and severe symptoms lead to novel insights
beyond the current literature.

========================
Peter Kim, University of Guelph

Testing Quantum States for Purity

The simplest states of finite quantum systems are the pure states. This paper is motivated by the need to test whether or not a given state is pure. Because the pure states lie in the boundary of the set of all states, the usual regularity conditions that justify the standard large-sample approximations to the null distributions of the deviance and the score statistic are not satisfied. For a large class of quantum experiments that produce Poisson count data, this paper uses an enlargement of the parameter space of all states to develop likelihood ratio and score tests of purity. The asymptotic null distributions of the corresponding statistics are chi-squared. The tests are illustrated by the analysis of some quantum experiments involving unitarily correctable codes.

========================
Samuel Kou, Harvard University
Coauthors: Benjamin Olding

Multi-resolution inference of stochastic models from partially observed data

Stochastic models, diffusion models in particular, are widely used in science, engineering and economics. Inferring the parameter values from data is often complicated by the fact that the underlying stochastic processes are only partially observed. Examples include inference of discretely observed diffusion processes, stochastic volatility models, and double stochastic Poisson (Cox) processes. Likelihood based inference faces the difficulty that the likelihood is usually not available even numerically. Conventional approach discretizes the stochastic model to approximate the likelihood. In order to have desirable accuracy, one has to use highly dense discretization. However, dense discretization usually imposes unbearable computation burden. In this talk we will introduce the framework of Bayesian multi-resolution inference to address this difficulty. By working on different resolution (discretization) levels simultaneously and by letting the resolutions talk to each other, we substantially improve not only the computational efficiency, but also the estimation accuracy. We will illustrate the strength of the multi-resolution approach by examples.

========================
Hua Liang, University of Rochester Medical Center

Coauthors: Hansheng Wang and Chih-Ling Tsai

Profiled Forward Regression for Ultrahigh Dimensional Variable Screening in Semiparametric Partially Linear Models
In partially linear model selection, we develop a profiled forward regression (PFR) algorithm for ultrahigh dimensional variable screening. The PFR algorithm effectively combines the ideas of nonparametric profiling and forward regression. This allows us to obtain a uniform bound for the absolute difference between the profiled and original predictors. Based on this important finding, we are able to show that the PFR algorithm discovers all relevant variables within a few fairly short steps. Numerical studies are presented to illustrate the performance of the proposed method.

========================
Yufeng Liu, University of North Carolina
at Chapel Hill
Coauthors: Helen Hao Zhang (NSCU) and Guang Cheng (Purdue)

Automatic Structure Selection for Partially Linear Models
Partially linear models provide good compromises between linear and nonparametric models. However, given a large number of covariates, it is often difficult to objectively decide which covariates are linear and which are nonlinear. Common approaches include hypothesis testing methods and screening procedures based on univariate scatter plots. These methods are useful in practice; however, testing the linearity of multiple functions for large dimensional data is both theoretically and practically challenging, and visual screening methods are often ad hoc. In this work, we tackle this structure selection problem in partially linear models from the perspective of model selection. A unified estimation and selection framework is proposed and studied. The new estimator can automatically determine the linearity or nonlinearity for all covariates and at the same time consistently estimate the underlying regression functions. Both theoretical and numerical properties of the resulting estimators are presented.

========================
Jinchi Lv, University of Southern California

Non-Concave Penalized Likelihood with NP-Dimensionality
Coauthors: Jianqing Fan (Princeton University)
Penalized likelihood methods are fundamental to ultra-high dimensional variable selection. How high dimensionality such methods can handle remains largely unknown. In this paper, we show that in the context of generalized linear models, such methods possess model selection consistency with oracle properties even for dimensionality of Non-Polynomial (NP) order of sample size, for a class of penalized likelihood approaches using folded-concave penalty functions, which were introduced to ameliorate the bias problems of convex penalty functions. This fills a long-standing gap in the literature where the dimensionality is allowed to grow slowly with the sample size. Our results are also applicable to penalized likelihood with the L1-penalty, which is a convex function at the boundary of the class of folded-concave penalty functions under consideration. The coordinate optimization is implemented for finding the solution paths, whose performance is evaluated by a few simulation examples and the real data analysis.

========================
Bin Nan, University of Michigan
Coauthors: Xuejing Wang, Ji Zhu, Robert Koeppe

Sparse 3D Functional Regression via Haar Wavelets
PET imaging has great potential to aid diagnosis of neurodegenerative diseases, such as Alzheimers disease or mild cognitive impairment. Commonly used region-of-interest analysis loses detailed voxel-level information. Here we propose a three-dimensional functional linear regression model, treating the PET images as three-dimensional functional covarites. Both image and functional regression coefficient are expanded using the same set of Haar wavelet bases. The functional regression model is then reduced to a linear regression model. We found the sparsity of original functional regression coefficient can be achieved by the sparsity of the regression coefficients in the reduced model after wavelet transformation. Lasso procedure can be implemented with the level of Haar wavelet expansion as an additional tuning parameter.

========================
Annie Qu , Department of Statistics, University of Illinois at Urbana-Champaign
Coauthors: Peng Wang, University of Illinois at Urbana-Champaign; Guei-feng Tsai, Center for Drug Evaluation of Taiwan

Conditional Inference Functions for Mixed-Effects Models with Unspecified Random-Effects Distribution
In longitudinal studies, mixed-effects models are important for addressing subject-specific effects. However, most existing approaches assume a normal distribution for the random effects, and this could affect the bias and efficiency of the fixed-effects estimator. Even in cases where the estimation of the fixed effects is robust with a misspecified distribution of the random effects, the estimation of the random effects could be invalid. We propose a new approach to estimate fixed and random effects using conditional quadratic inference functions. The new approach does not require the specification of likelihood functions or a normality assumption for random effects. It can also accommodate serial correlation between observations within the same cluster, in addition to mixed-effects modeling. Other advantages include not requiring the estimation of the unknown variance components associated with the random effects, or the nuisance parameters associated with the working correlations. Real data examples and simulations are used to compare the new approach with the penalized quasi-likelihood approach, and SAS GLIMMIX and nonlinear mixed effects model (NLMIXED) procedures.

========================
Sunil Rao, University of Miami, Division of Biostatistics

Coauthors: Hemant Ishwaran, Cleveland Clinic

Mixing Generalized Ridge Regressions
Hoerl and Kennard proposed generalized ridge regression (GRR) almost forty years ago as a means to overcome the deficiency of least squares in multicollinear problems. Because high-dimensional regression problems naturally involve correlated predictors, in part due to the nature of the data and in part due to artifact of the dimensionality, it is reasonable to consider GRR for addressing these problems. We study GRR in problems in which the number of predictors exceeds the sample size. We describe a novel geometric intrepretation for GRR in terms of a uniquely defined least squares estimator. However, the GRR is constrained to lie in a low-dimensional subspace which limits its effectiveness. To overcome this, we introduce a mixing GRR procedure using easily constructed exponential weights and establish a finite sample minimax bound for this procedure. A term that appears is a dimensionality effect which poses a problem in ultra-high dimensions that we address by using a mixing GRR for filtering variables. We study the performance of this procedure as well as a hybrid method using a range of examples.

========================

Enayetur Raheem, University of Windsor/Windsor-Essex County Health Unit
Coauthors: Kjell Doksum, S. E. Ahmed

Absolute Penalty and B-spline-based Shrinkage Estimation in Partially Linear Models
In the context of a partially linear regression model (PLM), we utilized shrinkage and absolute penalty estimation technique for simultaneous model selection and parameter estimation. Ahmed et al (2007) in a similar setup considered kernel-based estimate of the nonparametric component while B-spline is considered in our setup. We developed shrinkage semiparametric estimators that improve upon the classical estimators when there are nuisance covariates present in the model. In comparing two modelswith and without the nuisance covariates, the shrinkage estimators take an adaptive approach in a way that the information contained in the nuisance variable is utilized if it is tested to be useful for overall fit of the model. Bias expressions and risk properties of the estimators are obtained. Application of the proposed methods to a real data set is provided.

Since the B-spline can be incorporated in a regression model easily, we attempted to numerically compare the performance of our proposed method with the lasso. While both shrinkage and lasso outperform classical estimators, shrinkage estimators perform better than lasso in terms of prediction errors when there are many nuisance variables and the sample size is moderately large.

========================
Xiaotong Shen, School of Statistics, University of Minnesota
Coauthors: Hsin-Cheng Huang

On simultaneous supervised clustering and feature selection
In network analysis, genes are known to work in groups by their biological functionality, where distinctive groups reveals different gene functionalities. In such a situation, identifying grouping structures as well as informative genes becomes critical in understanding progression of a disease. Motivated from gene network analysis, we investigate, in a regression context, simultaneous supervised clustering and feature selection over an arbitrary undirected graph, where each predictor corresponds to one node in the graph and existence of a connecting path between two nodes indicates possible grouping between the two predictors. In this talk, I will discuss methods for simultaneous supervised clustering and feature selection over a graph, and argue that supervised clustering and feature selection are complementary for identifying a simpler model with higher predictive performance. Numerical examples will be given in addition to theory.

========================
Christopher G. Small, University of Waterloo

Multivariate analysis of data in curved shape spaces
We consider some statistical methods for the analysis of images and objects whose shapes are encoded as points in Kendall shape spaces. Standard multivariate methods, applicable to data in Euclidean spaces, do not directly apply to such contexts. The talk highlights the necessity for methods which respect the essentially non-Euclidean nature of shape spaces. An application to data from anthropology will be given.

========================
Hao Helen Zhang, North Carolina State University
Coauthors: Wenbin Lu and Hansheng Wang

On Sparse Estimation for Semiparametric Linear Transformation Models
Semiparametric linear transformation models have received much attention due to its high flexibility in modeling survival data. A useful estimating equation procedure was recently proposed by Chen et al. (2002) for linear transformation models to jointly estimate parametric and nonparametric terms. They showed that this procedure can yield a consistent and robust estimator. However, the problem of variable selection for linear transformation models is less studied, partially because a convenient loss function is not readily available under this context. We propose a simple yet powerful approach to achieve both sparse and consistent estimation for linear transformation models. The main idea is to derive a profiled score from the estimating equation of Chen et al. (2002), construct a loss function based on the profile scored and its variance, and then minimize the loss subject to some shrinkage penalty. We show that the resulting estimator is consistent for both model estimation and variable selection. Furthermore, the estimated parametric terms are asymptotically normal and can achieve higher efficiency than that yielded from the estimation equations. We suggest a one-step approximation algorithm which can take advantage of the LARS path algorithm. Performance of the new procedure is illustrated through numerous simulations and real examples including one microarray data.

========================
Hongtu Zhu, Department of Biostatistics and Biomedical Research Imaging Center, UNC-Chapel Hill

Smoothing Imaging Data in Population Studies.
Coauthors: Yimei Li, Yuan Ying, Runze Li, Steven Marron, Ja-an Lin, Jianqing Fan, John H. Gilmore, Martin Styner, Dinggang Shen, Weili Lin
Motivated by recent work studying massive imaging data in large neuroimaging studies,we propose various multiscale adaptive smoothing models (MARM) for spatially modeling the relation between high-dimensional imaging measures on a three-dimensional (3D) volume or a 2D surface with a set of covariates. Statistically, MARM can be regarded as a novel generalization of functional principal component analysis (fPCA) and varying coefficient models (VCM) in higher dimensional space compared to the standard fPCA and VCM. We develop novel estimation procedures for MARMs and systematically study their theoretical properties. We conduct Monte Carlo simulation and real data analyses to examine the finite-sample performance of the proposed procedures.

### Poster Session

S. Ejaz Ahmed and Saber Fallahpour, Department of Mathematics and Statistics
University of Windsor

L1 Penalty and Shrinkage Estimation in Partially Linear Models with Random Coefficient Autoregressive Errors
In partially linear models (PLM) we consider methodology for simultaneous model selection and parameter estimation with random coefficient autoregressive errors using lasso and shrinkage strategies. The current work is an extension to Ahmed et al. (2007) where they considered a PLM with random errors. We provide natural adaptive estimators that significantly improve upon the classical procedures in the situation where some of the predictors are nuisance variables that may or may not affect the association between the response and the main predictors. In the context of two competing partially linear regression models (full and sub-models), we consider an adaptive shrinkage estimation strategy. We develop the properties of these estimators using the notion of asymptotic distributional risk. The shrinkage estimators (SE) are shown to have a higher efficiency than the classical
estimators for a wide class of models. For the lasso-type estimation strategy, we devise efficient algorithms to obtain numerical results. We compare the relative performance of lasso with the shrinkage and other estimators. Monte Carlo simulation experiments are conducted for various combinations of the nuisance
parameters and sample size, and the performance of each method is evaluated in terms of simulated mean squared error. The comparison reveals that the lasso and shrinkage strategies outperform the classical procedure. The SE performs better than the lasso strategy in the effective part of the parameter space when, and only when, there are many nuisance variables in the model. A data example is showcased to illustrate the usefulness of suggested methods.

Reference:
Ahmed, S. E., Doksum, K. A., Hossain, S. and You, J. (2007). Shrinkage, pretest and absolute penalty estimators in partially linear models. Aust. New Zealand J. Stat, 49, 435-454.

========================

Billy Chang, Ph.D. Candidate (Biostatistics), Dalla Lana School of Public Health, University of Toronto
Author: Billy Chang and Rafal Kustra

Regularization for Nonlinear Dimension Reduction by Subspace Constraint
Sparked by the introduction of Isomap and Locally-Linear-Embedding in year 2000, nonlinear approaches to dimension reduction have received unprecedented attention during the past decade. Although the flexibility of such methods has provided scientists powerful ways for feature extraction and visualization, their applications are focused mainly on large-sample and low-noise settings. In small sample, high-noise settings, model regularization is necessary to avoid over-fitting. Yet, over-fitting issues for nonlinear dimension reduction have not been widely explored, even for earlier methods such as kernel PCA and multi-dimensional scaling.

Regularization for nonlinear dimension reduction is a non-trivial task; while an overly-complex model will over-fit, an overly-simple model cannot detect highly nonlinear signals. To overcome this problem, I propose performing nonlinear dimension reduction within a lower-dimensional subspace. As such, one can increase the model complexity for nonlinear pattern search, while over-fitting is avoided as the model is not allowed to traverse through all possible dimensions. The crux of the problem lies in finding the subspace containing the nonlinear signal, and I will discuss a Kernel PCA approach for the subspace search, and a principal curve approach for nonlinear basis construction.
========================
Abdulkadir Hussein, Ejaz Ahmed and Marwan Al-Momani, U of Windsor

To homogenize or not to homogenize: The case of linear mixed models
The problem of whether a given data supports heterogeneous or homogeneous models has a long history and perhaps its major manifestation is in the form of generalized linear mixed models. By heterogeneous models we mean models where diversity among possible subpopulations is accommodated by using variance components. Among other areas, this problem arises in economics, finance, and Biostatistics under various names such as panel, longitudinal or cluster correlated data. Homogeneity is a desired property while heterogeneity is often a fact of life. in order to reconcile these two types of models and seek unity in diversity, we propose and explore several shrinkage-type estimators for regression coefficient parameters as well as for the variance components . We examine the merits of the different estimators by using asymptotic risk assessment measures and by using Monte Carlo simulations. We apply the proposed methods to income panel data.

========================

Variable Selection in Multipath Change-point Problems
Follow-up studies are frequently carried out to study evolution of one or several measurements taken on some subjects through time. When a stimulus is administered on subjects, it is of interest to study the reaction times, change-points. One may want to select the covariates that accelerate reaction to the stimulus. Selecting effective covariates in this setting pose a challenge when the number of covariates is large. We develop such methodology and study, the large sample behavior of the method. Small sample behavior is studied by the means of simulation. The method is applied to a Parkinson disease data set.

========================
Xin Tong, Princeton University
Coauthors: Philippe Rigollet (Princeton University)

Neyman-Pearson classification, convexity and stochastic constraints
Motivated by problems of anomaly detection, this paper implements the Neyman-Pearson paradigm to deal with asymmetric errors in binary classification with a convex loss. Given a finite collection of classifiers, we combine them and obtain a new classifier that satisfies simultaneously the two following properties with high probability: (i) its probability of type~I error is below a pre-specified level and (ii), it has probability of type~II error close to the minimum possible. The proposed classifier is obtained by solving an optimization problem with an empirical objective and an empirical constraint. New techniques to handle such problems are developed and have consequences on chance constrained programming.

========================
Chen Xu (Dept.of Stat, UBC), Song Cai (Dept.of Stat, UBC)
Soft Thresholding-based Screening for Ultra-high dimensional Feature Spaces
Variable selection and feature extraction are fundamental for knowledge discovery and statistical modeling with high-dimensionality. To reduce the computational burden, variable screening techniques, such as the Sure Independence Screening (SIS; Fan and Lv, 2008), are often used before the formal analysis. In this work, we propose another computational efficient procedure for variable screening through a soft thresholding-based iteration (namely, the soft thresholding screening, STS). The STS could efficiently screen out most of the irrelevant features (covariates), while keep those important ones in the model with high probability. With dimensionality reduced from high to low, the refined model after STS then serves as a good starting point for further selection. The excellent performance of STS is supported by various numerical studies.