The National Program on
Complex Data Structures


October 13-15, 2005
Workshop on Current Issues in the Analysis of Incomplete Longitudinal Data
held at the Fields Institute , 222 College Street, Toronto


Paul S. Albert, National Institute of Health
Transitional, Random Effects, and Latent Processes Approaches for Analyzing Longitudinal Data with Missingness: A Comparison of Approaches With Applications to an Opiates Clinical Trial

This talk will focus on a comparison of techniques for analyzing an opiate clinical trial dataset. The trial randomized 162 patients to one of three treatment arms: an experimental buprenorphine arm and arms associated with two dose levels of methadone. Our focus is on the comparison of the buprenorphine arm with the low dose methadone arm. Patients were followed thrice weekly for 17 weeks after randomization with the outcome being whether an addict was positive on each of the repeated urine tests. The primary statistical endpoints in this trial were the overall proportions of positive urine tests and the mean number of visits to the first occurrence of a positive urine test 4 weeks after randomization. Thus, we were interested in both marginal and transitional inferences. A complication in this analysis was the large percentage of patients who dropout out early and who had intermittent missingness. For example, the proportion of dropout by the end of follow-up was 80% in the methadone group and 59% in the buprenorphine group. Further, based on substantive grounds and empirical evidence, the missing data mechanism is likely non-ignorable. We will discuss a number of approaches for analyzing these longitudinal binary data, accounting for non-ignorable missing data. First, we will discuss a transitional approach which incorporates both a Markov model for the response and the missing data mechanisms. This approach will incorporate a selection model to account for non-ignorable intermittent missingness and dropout. Second, we will discuss approaches which account for non-ignorable missingness by linking the response model to the missing data model through shared random effects. Third, we will discuss a modeling approach which links the two processes through a shared continuous-time random process. Forth, a Markov model with shared random effects will be discussed. All these approaches will be used to analyze the opiate clinical trial dataset.
The consistency of inferences on treatment differences between the methods provides a type of sensitivity analysis.

K. Carriere Chough, University of Alberta
Analysis of repeated measures data with missing values: An overview of methods

Assuming that the missing data occurs at random, we discuss analysis approaches to repeated measures data. We first discuss the importance of recognizing appropriate missing data mechanisms. We then review multiple imputation methods and compare the advantages and disadvantages with those of other missing data methods for repeated measures data. Generally, multiple imputation approaches are less powerful while it may enjoy computational convenience. However, as long as all available data are used in the analysis, any approach appears to result in consistent and efficient analysis. We also recognize that there is a need to further develop analysis methods suitable for small sample repeated measures data that does not depend on large sample theory.

Vern Farewell, MRC Biostatistics Unit
A multi-state model for related outcome data with different censoring distributions

Serious coronary heart disease (CHD) is a primary outcome in the Whitehall II study, a large epidemiological study of British civil servants. Both fatal and non-fatal CHD events are of interest and while essentially complete information is available on fatal events, the observation of non-fatal events is subject to potentially informative censoring. The use of a multi-state model for the analysis of such data is investigated. A particular focus is on the relationship between civil service grade and CHD events.

Paul Gustafson, University of British Columbia
Issues in Measurement Error Adjustment

One common way in which exposure-outcome data can be incomplete is if the exposure variable is poorly measured. This is a common problem in both longitudinal and non-longitudinal settings, and it is well-known that pretending the poor measurements are good can give misleading inferences. Thus there is a considerable literature on methods which adjust for measurement error or misclassification in explanatory variables. Two substantial issues in the literature are as follows. First, there is debate about how `parametric' one should be when adjusting for measurement error. Second, there can be a gap between what might realistically be assumed about the measurement error mechanism in practice, and what has to be assumed to obtain a formally identified model. I will comment on both these issues. I will illustrate the identifiability issue in a scenario where a putative instrumental variable is available in addition to the surrogate exposure variable.

X. Joan Hu, Simon Fraser University
Estimation from Incomplete Longitudinal Data
---What We Learn from Event History Data Analyses

There are many well-established methods in event history data analysis. Learning from them, we explore alternative estimation procedures to the literature of longitudinal analysis. In this talk, I will focus on nonparametric and semiparametric estimation from longitudinal data with random missing. Situations with informative missing will be discussed at some length.

Hyang Mi Kim, University of Alberta
Impact of using grouping strategy with miss-measured exposures in logistic and Cox proportional hazard models and some improvement by Bayesian approach in logistic models.

In occupational epidemiology, it is often possible to obtain repeated measurements of exposure from only a sample of workers who belong to exposure groups associated with different levels of exposure in a cohort with known health outcomes. Average exposures from a sample of workers can be assigned to all members of that group, including those who were not sampled, leading to a group-based exposure assessment strategy.

We show how this group-based exposure assessment with miss- measured exposures leads to properties of measurement error that is of Berkson type when the number of subjects with exposure measurements in each group is large, and how it can be shown that the error variance approximates the between-worker variance. We next study the implication of this to the slope parameter estimation in logistic and Cox proportional-hazards models. Under the normality assumption of exposures and with moderately large number of workers in each group, there is attenuation in the estimate of relative risk, the magnitude of which depends on the size of between-worker variance and true association parameter. Approximate equations for the attenuation have been derived under some conditions in logistic and Cox proportional-hazards models. These equations show that the attenuation in Cox proportional-hazards models is generally more severe than that in logistic regression. Furthermore, when the between-worker variability is large, our simulation study found that the attenuation should not be ignored in both models. Subsequently we developed a method to adjust for measurement error in such cases. We apply a Bayesian Berkson error-in-variable model to reduce the attenuation for large between-worker variance in logistic models. The results show that Bayesian Berkson approach for grouping strategy gives improved estimates when the measurement error variance is large and is superior to naïve analysis with group-based exposure assessment.

Jerry Lawless, University of Waterloo
Issues in the Use of Multi-State Models for Event History Analysis

This talk will first survey some of the main applications of multi-state models in event history analysis. The flexibility of such models in describing features such as interactions between events, history-dependent losses to followup, and cumulative cost histories will be discussed. Standard parametric and nonparametric methods of analysis will be reviewed briefly, followed by a discussion of some areas where there are currently gaps in methodology for dealing with incomplete data.

Xihong Lin, Harvard University
Modeling the Association between age at a marker event and age at menopause using a varying-coefficient model and a cross-ratio model

It is of recent interest in reproductive health research to investigate the validity of a marker event for the onset of menopausal transition and the association of age at a marker event and age at menopause. Formal statistical analysis of this dependence is challenged by the fact that both themarker event and menopause are subject to right censoring and their association depends on age at the marker event. We propose two approaches to investigate this and discuss pros and cons of each approach. We first discuss a varying coefficient Cox model by regressing age at menopause on age at the marker event using a regression spline. We next discuss the a piece-wise cross-ratio model to measure their dependence by assuming the cross-ratio to be a piecewise constant function of age at onset of the marker event. We propose two estimation procedures termed as the direct two-stage method and the sequential two-stage method, while the latter is extended to allow for covariates in marginal survival functions. The proposed methods are applied to the analysis of the Tremin Trust data, and their performance is evaluated using simulations.

Todd A. MacKenzie, Dartmouth Medical School
The Use of Auxiliary Variables or Markers in Clinical Trials with a Survival Endpoint

Abstract: Markers, which are prognostic longitudinal variables, can be used as auxiliary variables to replace some of the information lost due to right censoring. They may also be used to remove or reduce bias due to informative censoring. We review and propose novel methods for incorporating information from either categorical or continuous markers into estimates of survival, two sample test statistics and estimates of the hazard ratio. Using simulations, we show that these estimators and tests can be up to 30\% more efficient than the usual estimators and tests, if the marker is highly prognostic and if the frequency of censoring is high.

Jason Nielsen, Simon Fraser University
Mixed Nonhomogeneous Poisson Process Spline Models for the Analysis of Recurrent Event Panel Data

A flexible semiparametric model for analyzing longitudinal panel count data is presented. Panel count data refers here to count data on recurrent events collected as the number of events which have occurred within specific followup periods. The model assumes that the counts for each subject are generated by a nonhomogeneous Poisson process with a smooth intensity function. Such smooth intensities are modeled with adaptive splines. Both random and discrete mixtures of intensities are considered to account for complex correlation structures, heterogeneity and hidden subpopulations common to this type of data. An estimating equation approach to inference requiring only low moment assumptions is developed and the method is illustrated on several data sets.

James Robins, Harvard University
Robust and Honest Confidence Intervals with Longitudinal Missing Data: Application of a Unified Theory of Parametric, Semi and Nonparametric Statistics Based On Higher Dimensional Influence Functions

Suppose continuous variables data L(1),…,L(k) measured at corresponding times 1,…,k are right censored with the censoring mechanism known to satisfy coarsening at random. Suppose we wish to estimate the mean of L(k). A recent advance is the development of doubly robust (DR) estimators that are n1/2 consistent (the usual parametric rate) if either (but not necessarily both ) (i) a 'working' regression (OR) model for the regression of each L(m) on the past or (ii) a working model for the hazard of censoring given the past L(m) are correct . However, DR estimators are inconsistent if, as is inevitable, both working models are misspecified. Further, due to lack of power, it is often not possible to effectively test whether the working models are sufficiently close to being correct to guarantee small bias. Thus it seems a more honest assessment of uncertainty to use confidence intervals that (i) will include the true mean at their nominal coverage rate under weaker assumptions than for the DR estimators even at the price of shrinking to zero (with increasing sample size) at a rates less than the usual n-1/2 parametric rate.

We accomplish this goal by introducing estimators and confidence intervals for the mean based on higher dimensional U-statistics. These estimators are derived using a new unified theory of parametric, semi , and nonparametric statistics based on higher order scores (i.e., derivatives of the likelihood and influence functions that applies equally to both the square-root-n and non-square-root n problems), reproduces the results previously obtained by the modern theory of non-parametric inference, produces many new non-root- n results, and most importantly opens up the ability to perform optimal non-root n inference in complex high dimensional models.

Jeremy Taylor, University of Michigan
Joint models for longitudinal and survival data, with application to prostate cancer

In this talk I will present an overview of joint models for longitudinal biomarker data and event times, and discuss an application and possible extensions. These models present a general way to describe such data. From these general models a variety of possible issues can be addressed, including inference about the parameters in the survival model, inference about the parameters in the longitudinal model. The model can also provide information about whether the longitudinal biomarker could be useful as a surrogate endpoint or auxiliary variable in a clinical trial. The models can also provide a basis for imputation of missing longitudinal data or event times. The typical form of the longitudinal model is as a random effects or stochastic process model, and the typical form of the survival model is a proportional hazards model where the hazard depends on the "current true value" of the longitudinal variable. Estimation can be performed either in a 2-stage procedure or in a likelihood based way (either MLE or Bayesian). I will present a prostate cancer application where joint models have been fit. The longitudinal variable is PSA measurements following radiation therapy for prostate cancer, and the event time is recurrence of the disease. Extension of the model to include a cured fraction, semi-parametric longitudinal models and hazard models that depend on the derivative of the longitudinal variable will be discussed.

D. Roland Thomas, Carleton University
Latent Variable and Measurement Error Modeling in the Social Sciences

In the social sciences, latent variables (LVs) occupy the role played by measurement errors in the physical and medical sciences. This talk will briefly outline the social science approach to LV modeling, with a focus on limited information methods. For a variety of reasons, many practitioners still use a simple two-step approach to LV modeling, in which predicted LV scores are used as proxies in ordinary least squares (OLS) regression, leading in many cases to substantial bias. Simple scoring methods based on classical test theory (CTT) and non-linear scoring methods based on item response theory (IRT) will be described, and bias results based on a limited theoretical investigation will be presented, and illustrated using simulation. An alternative approach (Bollen, 1996) featuring the adaptation of instrumental variables and two-stage least squares (2SLS) methods to social science problems will then be described and compared to other methods using simulation. Some theoretical difficulties with the 2SLS approach will also be briefly discussed. Finally, a 2SLS approach to probit modeling with latent predictor variables will be outlined. This approach differs from the well known methods of Carroll, Rupert and Stefanski (1995) in that it yields consistent parameter estimates irrespective of the magnitude of the measurement errors. Most of the work described in the talk is joint with Irene Lu, of York University. The probit regression work is joint with Ken Bollen (UNC), Liqun Wang (Manitoba) and John Hipp (UNC).

Grace Y. Yi, University of Waterloo
Methods on Recurrent Event Data with Mismeasured Covariates

Recurrent event data arise often in longitudinal studies when events occur repeatedly over time. Various methods have been developed for the analysis of recurrent event data. Those methods include intensity-based counting process methods, mean function-based estimating equation methods and the analysis of times to events or times between events. The validity of those methods relies on the assumption that the variates are correctly measured. This assumption, however, is not satisfied for many practical problems. It is often the case that some covariates are subject to measurement error. In this talk we will first briefly review the analysis for recurrent event data and then focus the discussion on inferential methods which account for measurement error in covariates. This is based on joint work with Jerry Lawless.

Lang Wu, University of British Columbia
Nonlinear Mixed-Effects Models with Dropouts and Missing Covariates

Nonlinear mixed-effects models (NLME) are popular in longitudinal studies. In these studies, however, subjects may drop out early and covariates may contain missing data. We propose likelihood and approximate methods for NLME models with dropouts and missing covariates, using Monte-Carlo EM algorithms and MCMC methods. The approximate method is computationally more efficient than the likelihood method. A real dataset is analyzed using the proposed methods.

Leilei Zeng, Simon Fraser University
Challenges in Transitional Analysis of Longitudinal Data

Longitudinal studies often involve monitoring a dynamic process and scientific interest lies in modeling covariate effects on the rates of transitions between states (e.g. Muenz and Rubenstein, 1985). Transition models are often used by assuming an underlying Markov process.

Our research focuses on settings of longitudinal transitional data with more complex types of association structure. Examples include i) cluster-randomized trials of families, school-based intervention studies, ii) multivariate multi-state processes, and iii) spatially correlated data. Under these settings, not only longitudinal but also cross-sectional associations must be taken into account. We give estimating equations for joint estimation and inference with transitional models for multivariate/clustered longitudinal multi-state data. The methods are based on GEE2 (Zhao and Prentice, 1990) and alternating logistic regression (Carey et al., 1993). These approaches enable one to model covariate effects on marginal transition probabilities as well as the association parameters, and improved efficiency can also result from the joint estimation.

Several statistical problems of interest that are under investigation are listed below: Methods for spatially correlated longitudinal multi-state data, Random effects or mixed transitional models for cluster-randomized studies, Guidelines for the design of studies based on transitional models. Missing data is another common problem in longitudinal study. The methods to deal with incomplete data with transitional models under above settings are of interest too.


Back to Workshop index