Current Issues in the Analysis of Incomplete Longitudinal Data

Poster Presentations

Julie Horrocks, University of Guelph
Prediction of Binary Outcomes from Longitudinal Data

We compare methods for predicting a binary response from longitudinal data with missing values. A simple approach is to use summary measures of the non-missing longitudinal data (such as the mean, slope, or maximum value) as the predictor variable in a logistic regression model. Another approach is to fit a mixed linear model with random slopes and/or intercepts to the longitudinal data, and use the random coefficients as predictors in the logistic regression model. A Bayesian model is also examined. The methods are applied to a data set on adhesion of certain blood lymphocytes (CD56bright cells) in infertile women. It is thought that the shape of the longitudinal profiles of adhesion measurements over time can be used to predict the success of infertility treatments. This research is funded by CHRP and NSERC.

Tulay Koru-Sengul, University of Saskatchewan
Variable Selection Procedures In The Context of Multiple Imputation

Researchers are frequently faced with the problem of analyzing data with missing values. Missing values are practically unavoidable in large longitudinal studies, and incomplete data sets make the statistical analysis very difficult. Multiple imputation is one of the techniques developed to handle missing values. The use of multiple imputation has increased rapidly after implementation of the techniques by various commonly used statistical software packages. Although multiple imputation is not a new method of handling missing values, there has not been much work done on how to do variable selection within the framework of multiple imputation. This has been a very important issue in statistical analysis since the standard variable selection methods, such as stepwise, backward and forward selection, usually result in different selected variables across the multiply imputed datasets. In this paper, we will study various possible variable selection methods within the framework of multiple imputation. The methods will be outlined and applied to a longitudinal dataset with missing values.

He (Daniel) Li and Liqun Wang, University of Manitoba
Second-order least squares estimation for nonlinear mixed effects models

The main approach for the estimation of nonlinear mixed effects models focuses on the maximum likelihood method. Given the current computing capacity, intensive numerical integration often makes exact maximum likelihood estimation impractical. We propose two estimators for nonlinear mixed effects models where the distributions of the regression random errors are nonparametric and those of random effects are parametric but not necessarily normal. These estimators are based on the first two conditional moments of the response variable given the observed predictor variables. We present numerical examples demonstrating that these estimators are computationally feasible and practical, and they perform quite satisfactorily even for relatively small sample sizes.

Zhenguo Qiu, University of Northern British Columbia
Variations in NICU Length of Stay among Survivors in Canadian Neonatal Intensive Care Units, 1996-97

Previous studies have reported variation in Canadian neonatal intensive care units (NICU) length of stay (LOS) but little is known about the reason for this variation. We examine predictors of NICU LOS using Bayesian hierarchical modeling methods. Variations in NICU LOS were examined and quantified, accounting for patient risks at admission, NICU characteristics and patient-NICU interaction.

Forty-five percent of the variation in NICU LOS was attributable to patient risks and 13% to NICU characteristics. Neonatologist-medical staff ratio was partially responsible for longer NICU LOS among neonates with lethal congenital anomalies and neonates with complete maternal antenatal corticosteroid treatment. Availability of specialized services was associated with longer NICU LOS among neonates with complete maternal antenatal corticosteroid treatment. Also, neonates admitted to NICUs with high patient intake volume and high neonatologist-medical staff ratios tended to be longer in NICU LOS.

Annie Qu, Oregon State University
Unbiased and efficient estimation functions for correlated data with missing at random

We develop a consistent and highly efficient marginal model for missing at random data using estimating function approaches. Our approach differs from Robins et al.'s (1995) weighted estimating equations and Paik's (1997) imputation method in that our approach does not require knowing the missing mechanism, and does not require estimating the probability of missing based on an assumed model. Under the missing at random assumption we are able to formulate unbiased estimating functions which will guarantee an unbiased estimator, and further we also show that the unbiased estimating function is efficient using the idea of projection and semiparametric efficient bounds (Bickel et al.,
1993). Our approach requires one to approximate the true variance-covariance of responses reasonably well. We estimate the variance-covariance by using observed data. However, this assumption is less restricted as the correlation of observed data is not affected by the missing at random mechanism, that is, the correlation of observed data represents the correlation of complete data in general. Simulation results also indicate that our approach performs better as to bias and efficiency than weighted estimating equations and the imputation method.

Peng Zhang, University of Waterloo
Efficient Estimation of Long Term Treatment Effects on Disease Progression Using Non-normal Linear Mixed Models

This paper presents a new class of non-normal linear mixed models that offers an efficient estimation of the disease progression in the analysis of the longitudinal data from the MDRD (Modification of Diet in Renal Disease) trial. This new analysis utilizes the finding that the distribution of random effects is negatively skewed from both a preliminary data analysis and two previous analyses. We assume a log-gamma distribution for the random effects and provide the maximum likelihood inference in the resulting non-normal linear mixed model. To validate the adequacy of the log-gamma assumption versus the usual normality assumption for the random effects, we propose a lack-of-fit test that clearly indicates a better fit of the log-gamma modeling in analysis of MDRD data. This full maximum likelihood inference is advantageous to deal with the MAR type of dropouts encountered in the MDRD data.

Back to Workshop index

The National Program on
Complex Data Structures

October 13-15, 2005
Workshop on Current Issues in the Analysis of Incomplete Longitudinal Data
held at the Fields Institute , 222 College Street, Toronto

Poster Presentations

The National Program on Complex Data Structures

October 13-15, 2005 Workshop on Current Issues in the Analysis of Incomplete Longitudinal Data held at the Fields Institute , 222 College Street, Toronto

Poster Presentations

The National Program on
Complex Data Structures

October 13-15, 2005
Workshop on Current Issues in the Analysis of Incomplete Longitudinal Data
held at the Fields Institute , 222 College Street, Toronto