Speaker
Abstracts:
Richard
Cook, University of Waterloo
Robust Estimation of Mean Functions and Treatment Effects
for Recurrent Events under Event-Dependent Censoring and
Termination: Application to Skeletal Complications in
Cancer Metastatic to Bone
Recurrent
events frequently arise as responses in clinical trials
and other experimental settings. The definition and estimation
of treatment effects involves a number of interesting
issues, especially when loss to followup may be event-related
and when there are terminal events such as death, which
preclude the occurrence of further events. We discuss
a clinical trial of breast cancer patients with bone metastases,
where the recurrent events are skeletal complications,
and where patients may die during the trial. A number
of ways to specify treatment effects for recurrent event
responses have been described. We argue here in favor
of measures based on marginal rate and mean functions;
estimates of marginal rate and mean functions are very
useful as data summaries and as a basis for measuring
the effects of treatment or other baseline covariates.
When recurrent event data are subject to event-dependent
censoring ordinary marginal methods may, however, yield
inconsistent estimates.
Incorporating
inverse probability of censoring weights into the relevant
estimating equations can protect against such censoring
and yield consistent estimates of marginal features, provided
the censoring process is correctly specified. An alternative
approach is to obtain estimates of rate and mean functions
from intensity-based models that render censoring
conditionally independent, but this often complicates
the interpretation of covariate effects. We consider
three methods of estimating mean functions of recurrent
event processes and examine the bias and efficiency of
unweighted and inverse probability weighted versions of
the methods. The methods are extended to deal with the
presence of a terminating event. We compare the methods
via simulation and use them to deal with the clinical
trial.
This
is joint work with Jerry Lawless, Lajmi Lakhal-Chaieb
and Ker-Ai Lee.
_________________________________________________________
Christodoulos
A. Floudas, Princeton University
Advances in Biclusering Methods for Re-Ordering Data
Matrices in Systems Biology, Drug Discovery, and Prediction
of In Vivo Toxicities from In Vitro Experimental Data
Biclustering has emerged as an important problem in the
analysis of gene expression data since genes may only
jointly respond over a subset of conditions. Many of the
methods for biclustering, and clustering algorithms in
general, utilize simplified models or heuristic strategies
for identifying the best grouping of elements
according to some metric and cluster definition and thus
result in suboptimal clusters.
In
the first part of the presentation, we present a rigorous
approach to biclustering, OREO, which is based on the
Optimal RE-Ordering of the rows and columns of a data
matrix so as to globally minimize the dissimilarity metric
[1,2]. The physical permutations of the rows and columns
of the data matrix can be modeled as either a network
flow problem or a traveling salesman problem. The performance
of OREO is tested on several important data matrices arising
in systems biology to validate the ability of the proposed
method and compare it to existing biclustering and clustering
methods.
In
the second part of the talk, we will focus on novel methods
for clustering of data matrices that are very sparse [3].
These types of data matrices arise in drug discovery where
the x- and y-axis of a data matrix can correspond to different
functional groups for two distinct substituent sites on
a molecular scaffold. Each possible x and y pair corresponds
to a single molecule which can be synthesized and tested
for a certain property, such as percent inhibition of
a protein function. For even moderate size matrices, synthesizing
and testing a small fraction of the molecules is labor
intensive and not economically feasible. Thus, it is of
paramount importance to have a reliable method for guiding
the synthesis process to select molecules that have a
high probability of success. In the second part of the
presentation, we introduce a new strategy to enable efficient
substituent reordering and descriptor-free property estimation.
Our approach casts substituent reordering as a special
high-dimensional rearrangement clustering problem, eliminating
the need for functional approximation and enhancing computational
efficiency [4, 5]. Deterministic optimization approaches
based on mixed-integer linear programming can provide
guaranteed convergence to the optimal substituent ordering.
The proposed approach is demonstrated on a sparse data
matrix (about 29% dense) of inhibition values for 14,043
unknown compounds provided by Pfizer Inc. It is shown
that an iterative synthesis strategy is able to uncover
a significant percentage of the lead molecules while using
only a fraction of total compound library, even when starting
from a mere 3\% of the total library space.
In the third part of the presentation, we combine the
strengths of integer linear optimization and machine learning
to predict in vivo toxicities for a library of pesticide
chemicals using only in vitro data. Our approach utilizes
a biclustering method based on iterative optimal re-ordering
[1,2] to identify biclusters corresponding to subsets
of chemicals that have similar responses over distinct
subsets of the in vitro assays. This enables us to determine
subsets of experimental assays that are most likely to
be correlated with toxicity, according to the in vivo
data set. An optimal method based on integer linear optimization
(ILP) for re-ordering sparse data matrices [3] is also
applied to the in vivo dataset (21.3% sparse) in order
to cluster endpoints that have similar lowest effect level
(LEL) values, where it is observed that endpoints are
grouped according to similar physiological attributes.
Based upon the clustering results of the in vitro and
in vivo data sets, logistic regression is then utilized
to (a) learn the correlation between the subsets of in
vitro data and the in vivo responses, and (b) subsequently
predict the toxicity signatures of the chemicals. Our
approach aims to find the highest prediction accuracy
using the minimum number of in vitro descriptors.
[1]
DiMaggio, P. A. and McAllister, S. R. and Floudas, C.
A., and Feng, X. J. and Rabinowitz, J. D. and Rabitz,
H. A., "Biclustering via Optimal Re-ordering of Data
Matrices in Systems Biology: Rigorous methods and comparative
studies", BMC Bioinformatics, 9, 458 (2008).
[2] DiMaggio, P. A. and McAllister, S. R. and Floudas,
C. A., and Feng, X. J. and Rabinowitz, J. D. and Rabitz,
H. A., "A Network Flow Model for Biclustering via
Optimal Re-ordering of Data Matrices", J. Global
Optimization, in press, (2010).
[3 ] McAllister S.R., DiMaggio P.A., and C.A. Floudas,
"Mathematical Modeling and Efficient Optimization
Methods for the Distance-Dependent Rearrangement Clustering
Problem", J. of Global Optimization, 45, 111-129,
(2009).
[4] McAllister, S. R. and Feng, X. J. and DiMaggio, P.
A. and Floudas, C. A., and Rabinowitz, J. D. and Rabitz,
H. A., "Descriptor-free molecular discovery in large
libraries by adaptive substituent reordering", Bioorganic
& Medicinal Chemistry Letters, 18, 5967-5970, (2008).
[5] DiMaggio, P. A. and McAllister, S. R. and Floudas,
C. A., and Feng, X. J. and Rabinowitz, J. D. and Rabitz,
H. A., "Enhancing Molecular Discovery Using Descriptor-
Free Rearrangement Clustering Techniques for Sparse Data
Sets", AIChE J., 56(2), 405-418, (2010)
_________________________________________________________
Hamid
Ghaffari, University of Toronto
A semi-infinite linear programming approach to Gamma
Knife Perfexion
Perfexion,
the latest model of Gamma Knife treatment units, is used primarily
for radiosurgery treatments of brain tumors and lesions.
Treatments for Perfexion must be designed entirely manually,
which can lead to errors or sub-optimality treatments.
We model the Perfexion treatment planning problem as a
second-order conic optimization problem, and solve
it using a semi-infinite linear programming approach.
The resulting treatments plans meet or exceed clinical
guidelines for both radiosurgery and radiotherapy treatments,
and can be obtained in reasonable period of time.
_________________________________________________________
Mario
R. Guarracino, Italian National Research Council
Mathematical Models of Supervised Learning and their
Application to Medical Diagnosis
Supervised
learning refers to the ability of a system to learn from
a set of examples, represented by a set of input / output
pairs. This set of examples is referred to as the training
set. A training system is able to provide a answer (output)
to each new question (input). The term supervised derives
from the fact that the output for all training examples
is provided by an expert. In the simplest case, when the
response is one of only two potential possibilities, it
is referred to as binary classification.
Supervised learning models are applicable in many fields
of science and technology, such as economics, engineering
and medicine. In this study we focus on the discrimination
among differ types of leukemia.
Among supervised learning algorithms, there are the so-called
Support Vector Machines (SVM), exhibiting accurate solutions
and low training time. They are based on the statistical
learning theory and provide the solution by minimizing
a quadratic type cost function. SVM, in conjunction with
the use of kernel methods, provide non-linear classification
models, namely separations that cannot be expressed using
inequalities on linear combinations of parameters.
There are some problems that may reduce the effectiveness
of these methods. In fact, we are sometimes faced with
very large training sets. For example, in multi-center
clinical trials, experts from different institutions collect
data on many patients. In this case, techniques currently
in use determine the model considering all the available
data. Although they are well suited to cases under consideration,
they do not provide accurate answers in general. Therefore,
it is necessary to identify a subset of the training set
which contains all available information, providing a
model that still generalizes to new testing data.
It is also possible that the training sets vary over time,
for example, because data are added and modified as a
result of new tests or new knowledge. In this case, the
current techniques are not able to capture the changes,
but need to start the learning process from the beginning.
The techniques, which extract only the new knowledge contained
in the data and provide the learning model in an incremental
way, have the advantage of taking into account only the
experiments really useful and speed up the analysis.
In this talk, we describe some solutions to these problems,
with the support of numerical experiments.
_________________________________________________________
Igor
Jurisica, University of Toronto
Systematic identification and characterization of cancer
biomarkers
Cancer
development is a multi-step process that leads to uncontrolled
tumor cell growth. Multiple pathways are involved, typically
some signalling and regulatory pathways are activated,
while others are suppressed. Existing prognostic and predictive
biomarkers overlap only partially, and often do not validate
on external cohorts or by other biological assays. Several
technical reasons may explain this. Further evidence shows
that several clinically identical gene signatures exist,
and that the best signature may not comprise the most
differential genes.
To
address these challenges, we developed a system for an
integrative analysis, prediction and characterization
of signatures and relevant protein-protein interactions.
These computational predictions will lead to improved
human interactome coverage relevant to both basic and
cancer biology, and, importantly, may accelerate the development
of novel therapies for many cancers.
_________________________________________________________
Ioannis
A. Kakadiaris, University of Houston
Challenges and Opportunities for Extracting Cardiovascular
Risk Biomarkers from non-contrast CT data
In
this talk, I will first offer a short overview of the
research activities of the Computational Biomedicine Laboratory,
University of Houston. Then, I will present our research
in the area of biomedical image computing for the mining
of information from cardiovascular imaging data for the
detection of persons with a high likelihood of developing
a heart attack in the near future (vulnerable patients).
Specifically, Ill present methods for detection
and segmentation of anatomical structures, and shape and
motion estimation of dynamic organs. The left ventricle
in non-invasive cardiac MRI data is extracted using a
new multi-class, multi-feature fuzzy connectedness method
and deformable models for shape and volume estimation.
In non-invasive cardiac CT data, the thoracic fat is detected
using a relaxed version of multi-class, multi-feature
fuzzy connectedness method. Additionally, the calcified
lesions in the coronary arteries are identified and quantified
using a hierarchical supervised learning framework from
the CT data. In non-invasive contrast-enhanced CT, the
coronary arteries are detected using our tubular shape
detection method for motion estimation and, possibly,
for non-calcified lesion detection. In invasive IVUS imaging,
our team has developed a unique IVUS acquisition protocol
and novel signal/image analysis methods for the detection
(for the first time in-vivo) of vasa vasorum
(VV). The VV are micro-vessels that are commonly present
to feed the walls of larger vessels; however, recent clinical
evidence has uncovered their tendency to proliferate around
areas of inflammation, including the inflammation associated
with vulnerable plaques. In summary, our work is focused
on developing innovative computational tools to mine quantitative
parameters from imaging data for early detection of asymptomatic
cardiovascular patients. The expected impact of our work
stems from the fact that sudden heart attack remains the
number one cause of death in the US, and unpredicted heart
attacks account for the majority of the $280 billion burden
of cardiovascular diseases.
_________________________________________________________
Panos
M. Pardalos, University of Florida
Consistent Biclustering via Nonlinear 0-1 Optimization
Biclustering
is a data mining technique that consists in simultaneous
partitioning of the set of samples and the set of their
attributes (features) into subsets (classes). Samples
and features classified together are supposed to have
a high relevance to each other which can be observed by
intensity of their expressions. We have defined the notion
of consistency for biclustering using interrelation between
centroids of sample and feature classes. We have proved
that consistent biclustering implies separability of the
classes by convex cones.
In
this talk, we discuss both supervised and unsupervised
learning algorithms, where the biclustering consistency
is achieved by feature selection and outlier detection,
and also simulation studies investigating the probability
of existence of a consistent biclustering
on random data. The developed models involve solution
of a fractional 0-1 programming problem. Computational
results on DNA microarray data mining problems are reported.
_________________________________________________________
Kostas
Tsakalis, University of Arizona
Feedback Control for Epileptic Seizure Suppression
Experimental
observations indicate that transitions to epileptic seizures
are preceded by some form of increased synchronization.
Similar behavior can be observed by varying the parameters
in coupled chaotic oscillators, coupled neural populations,
and cortical neural populations with inhibitory-excitatory
inputs. Such mechanisms are very attractive tools to understand
basic mechanisms producing seizure-like phenomena, as
they explore general network properties and functionality
and do not rely on specific mathematical models. Their
simulation models also allow the quick investigation and
preliminary evaluation of strategies for seizure prediction
and seizure suppression.
Viewing
feedback action as a functional model achieving homeostasis
in various quantities like firing rate or neural output
correlation, we model the generation of seizures as functional
behavior of the populations rather than a generic connectivity
property. We then examine the effectiveness of a variety
of stimulation strategies in the suppression of seizures.
Our objective is to maintain normal activity
in terms of observed measures like synchrony, correlation
and amplitude in neural systems that are subject to extrinsic
and intrinsic perturbations. We evaluate several external
seizure-control paradigms like open loop periodic stimulation,
closed-loop ``modulating'' control with predefined stimuli,
and closed-loop feedback decoupling control. While the
precise implementation of these seizure suppression strategies
is an open issue, both simulation models and preliminary
experimental data from animal models indicate that the
use of multiple sites for data collection and stimulation
is essential for handling a wide range of seizure types.
_________________________________________________________
Stephen
Wright, University of Wisconsin-Madison
Developments in Optimization algorithms for data analysis
Data
analysis problems from biomedical informatics and related
areas often have large size and/or complexity, particularly
when interactions between effects are considered as predictors
along with main effects. Moreover, scientists often seek
simple, approximate solutions, which can be obtained from
optimization formulations with nonsmooth regularization
terms. The demanding features of these optimization problems
are driving development of algorithms that account for
the difficulty of the problem, the size of the data sets,
the presence of regularization terms, and the type of
solutions needed. We review algorithmic developments for
several classes of problems that arise in biomedical informatics
and other areas.
_________________________________________________________
Petros
Xanthopoulos, University of Florida
Data mining approaches for a multiclass cell death
discrimination problem
Accurate cell death discrimination is a time consuming
and expensive process that can only be performed in biological
laboratories. Nevertheless, it is very useful and arises
in many biological and medical applications. In this paper
we propose a data mining approach, applied on Raman spectroscopy
data, for a multiclass cell death discrimination problem.
We combine a regularized generalized eigenvalue algorithm
for multiple classification, together with a novel dimensionality
reduction algorithm based on spectral clustering. Results
show that the proposed algorithmic scheme can classify
cells from three different classes (apoptotic death, necrotic
death and control cells) with over 97% accuracy.