October 13-15, 2005
Paul S. Albert, National Institute of Health
This talk will focus on a comparison of techniques for analyzing an opiate
clinical trial dataset. The trial randomized 162 patients to one of three
treatment arms: an experimental buprenorphine arm and arms associated
with two dose levels of methadone. Our focus is on the comparison of the
buprenorphine arm with the low dose methadone arm. Patients were followed
thrice weekly for 17 weeks after randomization with the outcome being
whether an addict was positive on each of the repeated urine tests. The
primary statistical endpoints in this trial were the overall proportions
of positive urine tests and the mean number of visits to the first occurrence
of a positive urine test 4 weeks after randomization. Thus, we were interested
in both marginal and transitional inferences. A complication in this analysis
was the large percentage of patients who dropout out early and who had
intermittent missingness. For example, the proportion of dropout by the
end of follow-up was 80% in the methadone group and 59% in the buprenorphine
group. Further, based on substantive grounds and empirical evidence, the
missing data mechanism is likely non-ignorable. We will discuss a number
of approaches for analyzing these longitudinal binary data, accounting
for non-ignorable missing data. First, we will discuss a transitional
approach which incorporates both a Markov model for the response and the
missing data mechanisms. This approach will incorporate a selection model
to account for non-ignorable intermittent missingness and dropout. Second,
we will discuss approaches which account for non-ignorable missingness
by linking the response model to the missing data model through shared
random effects. Third, we will discuss a modeling approach which links
the two processes through a shared continuous-time random process. Forth,
a Markov model with shared random effects will be discussed. All these
approaches will be used to analyze the opiate clinical trial dataset.
Assuming that the missing data occurs at random, we discuss analysis approaches to repeated measures data. We first discuss the importance of recognizing appropriate missing data mechanisms. We then review multiple imputation methods and compare the advantages and disadvantages with those of other missing data methods for repeated measures data. Generally, multiple imputation approaches are less powerful while it may enjoy computational convenience. However, as long as all available data are used in the analysis, any approach appears to result in consistent and efficient analysis. We also recognize that there is a need to further develop analysis methods suitable for small sample repeated measures data that does not depend on large sample theory.
Serious coronary heart disease (CHD) is a primary outcome in the Whitehall II study, a large epidemiological study of British civil servants. Both fatal and non-fatal CHD events are of interest and while essentially complete information is available on fatal events, the observation of non-fatal events is subject to potentially informative censoring. The use of a multi-state model for the analysis of such data is investigated. A particular focus is on the relationship between civil service grade and CHD events.
One common way in which exposure-outcome data can be incomplete is if the exposure variable is poorly measured. This is a common problem in both longitudinal and non-longitudinal settings, and it is well-known that pretending the poor measurements are good can give misleading inferences. Thus there is a considerable literature on methods which adjust for measurement error or misclassification in explanatory variables. Two substantial issues in the literature are as follows. First, there is debate about how `parametric' one should be when adjusting for measurement error. Second, there can be a gap between what might realistically be assumed about the measurement error mechanism in practice, and what has to be assumed to obtain a formally identified model. I will comment on both these issues. I will illustrate the identifiability issue in a scenario where a putative instrumental variable is available in addition to the surrogate exposure variable.
There are many well-established methods in event history data analysis. Learning from them, we explore alternative estimation procedures to the literature of longitudinal analysis. In this talk, I will focus on nonparametric and semiparametric estimation from longitudinal data with random missing. Situations with informative missing will be discussed at some length.
Hyang Mi Kim, University of Alberta
In occupational epidemiology, it is often possible to obtain repeated measurements of exposure from only a sample of workers who belong to exposure groups associated with different levels of exposure in a cohort with known health outcomes. Average exposures from a sample of workers can be assigned to all members of that group, including those who were not sampled, leading to a group-based exposure assessment strategy.
We show how this group-based exposure assessment with miss- measured exposures leads to properties of measurement error that is of Berkson type when the number of subjects with exposure measurements in each group is large, and how it can be shown that the error variance approximates the between-worker variance. We next study the implication of this to the slope parameter estimation in logistic and Cox proportional-hazards models. Under the normality assumption of exposures and with moderately large number of workers in each group, there is attenuation in the estimate of relative risk, the magnitude of which depends on the size of between-worker variance and true association parameter. Approximate equations for the attenuation have been derived under some conditions in logistic and Cox proportional-hazards models. These equations show that the attenuation in Cox proportional-hazards models is generally more severe than that in logistic regression. Furthermore, when the between-worker variability is large, our simulation study found that the attenuation should not be ignored in both models. Subsequently we developed a method to adjust for measurement error in such cases. We apply a Bayesian Berkson error-in-variable model to reduce the attenuation for large between-worker variance in logistic models. The results show that Bayesian Berkson approach for grouping strategy gives improved estimates when the measurement error variance is large and is superior to naïve analysis with group-based exposure assessment.
This talk will first survey some of the main applications of multi-state models in event history analysis. The flexibility of such models in describing features such as interactions between events, history-dependent losses to followup, and cumulative cost histories will be discussed. Standard parametric and nonparametric methods of analysis will be reviewed briefly, followed by a discussion of some areas where there are currently gaps in methodology for dealing with incomplete data.
It is of recent interest in reproductive health research to investigate the validity of a marker event for the onset of menopausal transition and the association of age at a marker event and age at menopause. Formal statistical analysis of this dependence is challenged by the fact that both themarker event and menopause are subject to right censoring and their association depends on age at the marker event. We propose two approaches to investigate this and discuss pros and cons of each approach. We first discuss a varying coefficient Cox model by regressing age at menopause on age at the marker event using a regression spline. We next discuss the a piece-wise cross-ratio model to measure their dependence by assuming the cross-ratio to be a piecewise constant function of age at onset of the marker event. We propose two estimation procedures termed as the direct two-stage method and the sequential two-stage method, while the latter is extended to allow for covariates in marginal survival functions. The proposed methods are applied to the analysis of the Tremin Trust data, and their performance is evaluated using simulations.
Abstract: Markers, which are prognostic longitudinal variables, can be used as auxiliary variables to replace some of the information lost due to right censoring. They may also be used to remove or reduce bias due to informative censoring. We review and propose novel methods for incorporating information from either categorical or continuous markers into estimates of survival, two sample test statistics and estimates of the hazard ratio. Using simulations, we show that these estimators and tests can be up to 30\% more efficient than the usual estimators and tests, if the marker is highly prognostic and if the frequency of censoring is high.
A flexible semiparametric model for analyzing longitudinal panel count data is presented. Panel count data refers here to count data on recurrent events collected as the number of events which have occurred within specific followup periods. The model assumes that the counts for each subject are generated by a nonhomogeneous Poisson process with a smooth intensity function. Such smooth intensities are modeled with adaptive splines. Both random and discrete mixtures of intensities are considered to account for complex correlation structures, heterogeneity and hidden subpopulations common to this type of data. An estimating equation approach to inference requiring only low moment assumptions is developed and the method is illustrated on several data sets.
James Robins, Harvard University
Suppose continuous variables data L(1),
,L(k) measured at corresponding
,k are right censored with the censoring mechanism known
to satisfy coarsening at random. Suppose we wish to estimate the mean
of L(k). A recent advance is the development of doubly robust (DR) estimators
that are n1/2 consistent (the usual parametric rate) if either (but not
necessarily both ) (i) a 'working' regression (OR) model for the regression
of each L(m) on the past or (ii) a working model for the hazard of censoring
given the past L(m) are correct . However, DR estimators are inconsistent
if, as is inevitable, both working models are misspecified. Further, due
to lack of power, it is often not possible to effectively test whether
the working models are sufficiently close to being correct to guarantee
small bias. Thus it seems a more honest assessment of uncertainty to use
confidence intervals that (i) will include the true mean at their nominal
coverage rate under weaker assumptions than for the DR estimators even
at the price of shrinking to zero (with increasing sample size) at a rates
less than the usual n-1/2 parametric rate.
In this talk I will present an overview of joint models for longitudinal biomarker data and event times, and discuss an application and possible extensions. These models present a general way to describe such data. From these general models a variety of possible issues can be addressed, including inference about the parameters in the survival model, inference about the parameters in the longitudinal model. The model can also provide information about whether the longitudinal biomarker could be useful as a surrogate endpoint or auxiliary variable in a clinical trial. The models can also provide a basis for imputation of missing longitudinal data or event times. The typical form of the longitudinal model is as a random effects or stochastic process model, and the typical form of the survival model is a proportional hazards model where the hazard depends on the "current true value" of the longitudinal variable. Estimation can be performed either in a 2-stage procedure or in a likelihood based way (either MLE or Bayesian). I will present a prostate cancer application where joint models have been fit. The longitudinal variable is PSA measurements following radiation therapy for prostate cancer, and the event time is recurrence of the disease. Extension of the model to include a cured fraction, semi-parametric longitudinal models and hazard models that depend on the derivative of the longitudinal variable will be discussed.
In the social sciences, latent variables (LVs) occupy the role played by measurement errors in the physical and medical sciences. This talk will briefly outline the social science approach to LV modeling, with a focus on limited information methods. For a variety of reasons, many practitioners still use a simple two-step approach to LV modeling, in which predicted LV scores are used as proxies in ordinary least squares (OLS) regression, leading in many cases to substantial bias. Simple scoring methods based on classical test theory (CTT) and non-linear scoring methods based on item response theory (IRT) will be described, and bias results based on a limited theoretical investigation will be presented, and illustrated using simulation. An alternative approach (Bollen, 1996) featuring the adaptation of instrumental variables and two-stage least squares (2SLS) methods to social science problems will then be described and compared to other methods using simulation. Some theoretical difficulties with the 2SLS approach will also be briefly discussed. Finally, a 2SLS approach to probit modeling with latent predictor variables will be outlined. This approach differs from the well known methods of Carroll, Rupert and Stefanski (1995) in that it yields consistent parameter estimates irrespective of the magnitude of the measurement errors. Most of the work described in the talk is joint with Irene Lu, of York University. The probit regression work is joint with Ken Bollen (UNC), Liqun Wang (Manitoba) and John Hipp (UNC).
Recurrent event data arise often in longitudinal studies when events occur repeatedly over time. Various methods have been developed for the analysis of recurrent event data. Those methods include intensity-based counting process methods, mean function-based estimating equation methods and the analysis of times to events or times between events. The validity of those methods relies on the assumption that the variates are correctly measured. This assumption, however, is not satisfied for many practical problems. It is often the case that some covariates are subject to measurement error. In this talk we will first briefly review the analysis for recurrent event data and then focus the discussion on inferential methods which account for measurement error in covariates. This is based on joint work with Jerry Lawless.
Nonlinear mixed-effects models (NLME) are popular in longitudinal studies. In these studies, however, subjects may drop out early and covariates may contain missing data. We propose likelihood and approximate methods for NLME models with dropouts and missing covariates, using Monte-Carlo EM algorithms and MCMC methods. The approximate method is computationally more efficient than the likelihood method. A real dataset is analyzed using the proposed methods.
Longitudinal studies often involve monitoring a dynamic process and scientific interest lies in modeling covariate effects on the rates of transitions between states (e.g. Muenz and Rubenstein, 1985). Transition models are often used by assuming an underlying Markov process.
Our research focuses on settings of longitudinal transitional data with more complex types of association structure. Examples include i) cluster-randomized trials of families, school-based intervention studies, ii) multivariate multi-state processes, and iii) spatially correlated data. Under these settings, not only longitudinal but also cross-sectional associations must be taken into account. We give estimating equations for joint estimation and inference with transitional models for multivariate/clustered longitudinal multi-state data. The methods are based on GEE2 (Zhao and Prentice, 1990) and alternating logistic regression (Carey et al., 1993). These approaches enable one to model covariate effects on marginal transition probabilities as well as the association parameters, and improved efficiency can also result from the joint estimation.
Several statistical problems of interest that are under investigation are listed below: Methods for spatially correlated longitudinal multi-state data, Random effects or mixed transitional models for cluster-randomized studies, Guidelines for the design of studies based on transitional models. Missing data is another common problem in longitudinal study. The methods to deal with incomplete data with transitional models under above settings are of interest too.
Back to Workshop index