
THE FIELDS
INSTITUTE
FOR RESEARCH IN MATHEMATICAL SCIENCES 
April
1618, 2012
Workshop on Graphical Models:
Mathematics, Statistics and Computer Sciences
Abstracts


Wicher Bergsma, London School of Economics
Marginal models: some interesting theoretical problems arising
from applications
Marginal models, which can be used to deal with nuisance dependencies
in data, are ubiquitous in applied statistics. For example, graphical
models typically impose conditional independence restrictions
on marginal distributions. This talk discusses some of the theoretical
problems that arise when analyzing realworld data. These relate
to finding the correct dimension of a model and detecting whether
or not the model is smooth.
Frank Critchley (Open University, UK) and Paul Marriott
(Waterloo, Canada)
Computational Information Geometry and Graphical Models
Recent developments in information geometry, with a distinctive
computational focus, are briefly reviewed together with their
potential applications in graphical models (and, more widely,
in statistical inference). In addition to algebraic geometric
formulations, the differential geometric structure of extended
exponential families is incorporated. In particular, computation
implications of different forms of statistical curvature, and
also boundary effects, are investigated. The computational aspects
of this talk focus on statistical and numerical issues. Theoretical
developments are illustrated through wellknown graphical model
examples.
Robin Evans, University of Cambridge
Constraints on Marginalized DAGs
Models defined by the global Markov property for directed
acyclic graphs (DAGs) possess many nice properties, including
a simple factorization, graphical criteria for determining independences
(dseparation/moralization), and computational tractability. As
is well known, however, DAG models are not closed under the operation
of marginalization, so marginalized DAGs (mDAGs) describe a much
larger and less well understood class of models.
mDAG models can be represented by graphs with directed and bidirected
edges, under a modest extension of Richardson's ADMGs. We describe
the natural Markov property for mDAGs, which imposes observable
conditional independences, 'dormant' independences (Verma constraints)
and inequality constraints, and provide some results towards its
characterization. In particular we describe the Markov equivalence
classes of mDAG models over 3 observed variables. We also give
a constructive proof that when 2 observed variables are not joined
by any edge (directed or bidirected), then this always induces
some constraint on the observed joint distribution.
Stephen E., Fienberg, Department of Statistics, Carnegie
Mellon University}
Graphical Models for Network Data
Graphs play an important role as a representation for two dual
representations of statistical modelsone where the nodes correspond
to units and the edges to variables that relate them to on another,
and the other where the nodes are variables and the edges represent
relationships among them. In the former units are inherently dependent
and relationships may or may not be, whereas in the later the
units are inherently independent and the focus is on independence
relationships among the variables. In this talk I describe both
types of representations involving discrete relationships or variables,
and there interplay, focusing on likelihood estimation for Exponential
Random Graph Models.
Thomas Kahle, ETH Zürich
Positive margins and primary decompositions
In algebraic statistics one uses Markov bases to assess the
goodness of fit in graphical and other loglinear models. Even
if Markov bases are known, for many models they are too big to
be practical. We study the consequences of replacing a Markov
basis of a graphical model by a simple set of moves coming from
conditional independence statements. While those moves do not
connect all fibers, they may still be sufficient to connect all
fibers satisfying certain linear constraints on the margins, for
example strict positivity. This is the positive margins property
and it can be studied systematically using decompositions of binomial
conditional independence ideals. This method allows to clarify
the positive margins property for graphical models of decomposable
graphs, cycles, and some complete bipartite graphs. (Joint work
with Johannes Rauh and Seth Sullivant)
Steffen Lauritzen, University of Oxford
Graphical Gaussian Models with Symmetry and Unknown Means
Generally it is nontrivial to estimate the mean of Gaussian
distribution under linear restrictions when the covariance structure
is given by a graphical model with symmetry, a wellknown example
being the BehrensFisher problem corresponding to a graph with
two nodes and no edge for the covariance structure, with the means
associated with the two nodes being different. We investigate
the problem of estimating the means under equality constraints
in a graphical model with symmetry and establish that the solution
to this problem is simple if and only if the partitioning of the
variables according to their means is equitable with respect to
the edge colouring of the symmetry models and is finer than the
partitioning induced by the vertex colouring. Examples and consequences
for design are briefly discussed. This represents joint work with
Helene Gehrmann [Gehrmann, H. And Lauritzen, S. 2011. Estimation
of Means in Graphical Gaussian Models with Symmetries. arXiv:1101.3709.
To appear in the Annals of Statistics.]
Grard Letac,Institut de Mathmatiques de Toulouse
Characteristic functions of convex sets and applications
to contingency tables
A contingency table starts from the data of finite sets $I_v$
where $v\in V$ and where $V$ is finite, and from a Bernoulli probability
$(p(i))$ on $I=\prod_{v\in I}I_v.$ More specifically consider $X_1,\ldots,X_N$
which are iid and valued in $\mathbb{R}^I$ (where $(e_i)_{i\in I}$
as the canonical basis of $\mathbb{R}^I$) such that
$\Pr(X_1=e_i)=p(i)$. The random array of integers $S_N=X_1+\cdots+X_N$
is the contingency table. A hierarchical model for the contingency
table is given by the data of a family $\mathcal{D}$ of non empty
subsets $D$ of $V$ such that $D_1 \subset D$ and $D\in \mathcal{D}$
imply $D_1\in \mathcal{D}.$ The model associated to $\mathcal{D}$
is the set of all probabilities $(p(i))_{i\in I}$ such that $p(i)>0$
for all i\inI and such that there exist functions $i\mapsto \lambda_D(i)$
depending only on $i_v$ such that $v\in D$ satisfying $\log p(i)=\sum_{D}\lambda_D(i).$
This is actually an exponential family. Example: choose $V=\{a,b,c\}$
and $\mathcal{D}$ equal to $\{ab,bc,a,b,c\}.$ That means that we
restrict ourselves to probabilities of the form $p(i_a,i_b,i_c)=q(i_a,i_b)r(i_b,i_c).$
Having observed the contingency table $S_N,$ deciding whether
the model $\mathcal{D}$ is the good one or not is a difficult
problem. We attack it by the Bayesian factor method, which compares
two models $\mathcal{D}$ and $\mathcal{D}'.$ The tool for this
is a not very well known object, namely the characteristic function
$m\mapsto J_C(m)$ on a bounded (say) convex set $C$. It can be
defined in a number of ways whose the shortest (but not the most
transparent) is the volume of the polar set of $Cm.$ A better
way to introduce $J_C$ here would be through the DiaconisYlvizaker
integral of an exponential family. To each model $\mathcal{D}$
we attach the convex polytope $C$ of the support of the exponential
family above, and we compute $A=J_C(m(S_N/N))$ where $s\mapsto
m(s)$ is a canonical projection associated to $\mathcal{D}.$ The
ratio of the numbers $A$ and $A'$ corresponding to models $\mathcal{D}$
and $\mathcal{D}'$ is the essential ingredient for the computation
of the Bayesian factor. I will say more about the statistical
uses of the characteristic function $J_C,$ specially when the
hierarchical model is a graphical model attached to a triangulated
graph.
Luigi Malagò, Department of Electronics and Information,
Politecnico di Milano
Graphical Models and ModelBased Search Algorithms
Modelbased Search (MBS) is a paradigm in stochastic optimization
where the search for the minimum of a function is guided by a
statistical model, i.e., a set of probability distributions defined
over the search space. The optimum is obtained by sampling from
a sequence of probability distributions that minimize the expected
value of the function, estimated from a sample of points. The
framework is rather general and includes algorithms from Evolutionary
Computation and Stochastic Optimization. In this talk we focus
on Estimation of Distribution Algorithms (EDAs) which generate
sequences of distributions by iteratively selecting a subsample
according to the value of the function, estimating the parameters
of a distribution, and then sampling a new set of points. Most
EDAs use factorizations of the joint probability distribution
based on probabilistic graphical models. Two popular examples
are BOA, which employs Bayesian Networks, and DEUM, based on Markov
Random Fields. In blackbox contexts, when the analytic formula
of function to be optimized in unknown, it becomes crucial to
estimate a good model for the function, able to capture the correlations
among the variables, to increase the probability of finding the
global optimum. For this reason, EDAs, and more in general MBS
algorithms, need efficient algorithms for model selection, to
be applied in the high dimensional setting. We present a review
of the current state of the art, together with main issues and
recent advances, about the use of graphical models in EDA, with
particular focus on Markov Random Fields.
Sofia Massa, University
of Oxford
Estimating combination of graphical models
In some recent applications, the interest is in combining
information about relationships between variables from independent
studies performed under partially comparable circumstances. In
this talk weintroduce some motivating examples of the research
question, present some relevant types of combinations and discuss
issues related to the estimation of the parameters of the graphical
models combination.
Frantisek Matús,Institute
of Information Theory and Automation,Academy of Sciences of the
Czech Republic}
Around boundaries of exponential families
Limiting in the standard exponential families of probability
measures on Euclidean spaces will be reviewed, in particular,
when the mean goes along a segment inside the convex support of
such a family towards an endpoint on the boundary. Firstorder
behavior of the corresponding probability measures, information
divergences and variance functions can be described when the family
has finite support. This has implications on maximization of distances
from exponential families and on the problem of classification
of exponential families with special variance functions.
Mehdi Molkaraie, Department of Information Technology and
Electrical Engineering, ETH Zurich
Importance Sampling to Estimate the Entropy Function in Graphical
Models with Cycles"
We propose a multilayer importance sampling algorithm to estimate
the entropy of the output process of noisy source/channel models.
We focus on graphical models defined in terms of factor graphs
and carry out computations using a doubleloop algorithm on the
model. Numerical results for twodimensional noisy source/channel
models with nontrivial Markov structure are presented. Based
on joint work with HansAndrea Loeliger.
Guido Montufar, Pennsylvania State University, USA
The number of layers in a Deep Belief Network
Graphical models with several levels of unobserved variables can
represent certain probability distributions way more compactly
than models with fewer levels of unobserved variables. The most
accepted picture is that, depending on the target distributions,
there is a right choice of the number of levels and the number
of variables in each level. In this talk I discuss this tradeoff
of hyperparameters for the specific class of models deep belief
network.
Bala Rajaratnam, Stanford University
Positive definite completion problems for directed acyclic
graphs
A positive definite completion problem pertains to determining
whether the unspecified positions of a partial (or incomplete)
matrix can be completed in a desired subclass of positive definite
matrices. In this talk we consider an important and new class
of positive definite completion problems where the desired subclasses
are the spaces of covariance and inverse covariance matrices of
probabilistic models corresponding to directed acyclic graph models
(also known as Bayesian networks). We provide fast procedures
that determine whether a partial matrix can be completed in either
of these spaces and thereafter proceed to construct the completed
matrices. We prove an analog of the positive definite completion
result for undirected graphs in the context of directed acyclic
graphs, and thus proceed to characterize the class of DAGs which
can always be completed. We also proceed to give closed form expressions
for the inverse and the determinant of a completed matrix as a
function of only the elements of the corresponding partial matrix.
Johannes Rauh, Max Planck Institute for Mathematics in the
Sciences, Leipzig
The expressive power of mixture models and Restricted Boltzmann
Machines
Mixture models and Restricted Boltzmann Machines (RBMs) are
graphical models with hidden nodes that are widely used in statistical
modelling and machine learning. They are semialgebraic sets with
a complicated geometry, and a complete description in terms of
polynomial equalities and inequalities is unknown. Recent results
show that in many cases the dimension of the model equals the
expected dimension. Nevertheless, even if the model has the full
dimension, there may be large sets of probability distributions
that lie outside of the model. This shows that the dimension alone
does not suffice to describe the expressive power of these models.
In my talk I want to discuss results of a collaboration with Guido
Montúfar about sets of probability distributions that RBMs
and mixture models can or cannot approximate well.
Alessandro Rinaldo, Carnegie Mellon University
Maximum Likelihood Estimation in LogLinear Models
We study maximum likelihood estimation in loglinear models under
conditional Poisson sampling schemes. We derive necessary and
sufficient conditions for existence of the maximum likelihood
estimator (MLE) of the model parameters and investigate estimability
of the natural and meanvalue parameters under a nonexistent
MLE. Our conditions focus on the role of sampling zeros in the
observed table. We situate our results within the general framework
of extended exponential families and we rely in a fundamental
way on key geometric properties of loglinear models. We propose
algorithms for extended maximum likelihood estimation that improve
and correct existing algorithms for loglinear model analysis
used in virtually all standard statistical software packages.
(Joint work with Stephen Fienberg.)
Kayvan Sadeghi, University of Oxford
Stable Mixed Graphs
In this talk we discuss classes of graphs with three types
of edge that capture the modified independence structure of a
directed acyclic graph (DAG) after marginalisation over unobserved
variables and conditioning on selection variables using the $m$separation
criterion. These include MC, summary, and ancestral graphs. As
a modification of MC graphs we define the class of ribbonless
graphs (RGs) that permits the use of the $m$separation criterion.
RGs contain summary and ancestral graphs, and each RG can be generated
by a DAG after marginalisation and conditioning. We derive simple
algorithms to generate RGs from given DAGs or RGs, and also to
generate summary and ancestral graphs in a simple way by further
extension of the RGgenerating algorithm. By using these algorithms
we develop a parallel theory on the three classes, and study the
relationships between them as well as the use of each class.
Milan Studeny, Academy of Sciences of the Czech Republic/Institute
of Information Theory and Automation}
Integer programming approach to statistical learning graphical
models
At first, it will be explained how the statistical task of
learning Bayesian network structure, which is a special graphical
model of conditional independence structure, can be transformed
to a linear optimization task to maximize a certain linear function
(determined by data) over a special integral polytope. Then
suitable linear transformation of the optimization problem will
be described, which allows to consider a polytope with zeroone
vectors as vertices. Finally, by means of the concept of LP
relaxation of the polytope, the transformed optimization task
will be reformulated in the form an integer programming problem,
which seems to offer the way to solve the optimization task
in practice.
The talk is based on joint research with Raymond Hemmecke, Silvia
Lindner (both TU Munich) and David Haws (University of Kentucky).
Seth Sullivant,North Carolina State University
Identifiability of Large Phylogenetic Mixture Models
Phylogenetic mixture models are statistical models of character
evolution allowing for heterogeneity. Each of the classes in
some unknown partition of the characters may evolve by different
processes, or even along different trees. The fundamental question
of whether parameters of such a model are identifiable is difficult
to address, due to the complexity of the parameterization. We
analyze mixture models on large trees, with many mixture components,
showing that both numerical and tree parameters are indeed identifiable
in these models when all trees are the same. The techniques
we employ come from the algebraic geometry of tensors and are
likely to be useful for other hidden variable graphical models.
This is joint work with John Rhodes (U. of AlaskaFairbanks)
Henry Wynn, London School of Economics, UK
Hierarchical models and monomial ideals
The connections between graphical models and certain algebraic
structures, under a heading of ``algebraic statistics",
for example the link with toric ideals, is wellestablished.
The links have primarily been made for Poisson/multinomial models
and for the Gaussian case. Here it is shown how to introduce
algebraic concepts into quite general hierarchical and graphical
models. The concept of a local (differential) cumulant is developed
and related to hierarchical models, covering independence and
conditional independence in a natural way. Then, the isomorphism
between a type of differential ideal and a monomial ideal is
used to complete the connection.
This allows a translation between certain algebraic properties
such as {\em 2linear} and {\em shellable} and conditions for
hierarchical models and with statistical interpretations such
as decomposability and the lattice conditional independence
(LCI) property. In theory, interesting classes of ideal should
thus lead to interesting, and possible new, classes of statistical
models. A few examples are given. The differential ideal property
also leads to special families of distributions.
Jing Xi, University of Kentucky
Estimating the number of zeroone multiway tables via sequential
importance sampling
In 2005, Chen et al introduced a sequential importance sampling
(SIS) procedure to analyze zeroone twowaytables with given
fixed marginal sums (row and column sums) via the conditional
Poisson (CP) distribution. They showed that compared with Monte
Carlo Markov chain (MCMC)based approaches, their importance
sampling method is more efficient in terms of running time and
also provides an easy and accurate estimate of the total number
of contingency tables with fixed marginal sums. In this talk
we extend their result to zeroone multiway ($d$way, $d \geq
2$) contingency tables under the no $d$way interaction model,
i.e., with fixed $d  1$ marginal sums. Also we show by simulations
that the SIS procedure with CP distribution to estimate the
number of zeroone threeway tables under the no threeway interaction
model given marginal sums works very well even with some rejections.
We also applied our method to Samson's monks' data set. We end
with further questions on the SIS procedure on zeroone multiway
tables.
Ruriko Yoshida, University of Kentucky
Normality of the threestate toric homogeneous Markov chain
model
A discrete time Markov chain, a stochastic process with Markov
property, has been applied in many fields, such as physics,
chemistry, information sciences, economics, finances, mathematical
biology, social sciences, and statistics. A discrete time Markov
chain is often used as a statistical model from a random physical
process to fit the observe the data. A timehomogeneous Markov
chain is a process that each transition probability from a state
to a state does not depend on time. It is important to test
if the observed data set fits the assumption of the timehomogeneity
of the chain and in 2011, Hara and Takemura suggested a Markov
Chain Monte Carlo (MCMC) approach to a goodnessoffit test
for the timehomogeneity of the chain using Markov bases. In
this talk we focus on a timehomogeneous Markov chain (THMC)
model with three states and without any selfloops, i.e., the
probability from a state to itself is zero, and we study algebraic
and polyhedral geometric properties of Markov bases for the
THMC model with any time $T > 0$. More specifically, we showed
the number of facets of the convex hull generated by the columns
of the design matrix for the threestate THMC model without
loops for any time $T \geq 5$ is $24$, and we showed its explicit
hyperplane representation of the convex hull for any $T \geq
4$. Using this result we then showed that the semigroup generated
by the columns of the design matrix is normal for any $T >
0$ and finally we showed that a Markov basis for the toric ideal
associate with the design matrix for the threestate THMC model
without loops consists of binomials of degree less than or equal
to $d= 6$. We end this talk with conjectures on further analyses
on Markov bases for the THMC model.
This is joint work with D. Haws, A., Mart\'in del Campo, and
A. Takemura.
Piotr Zwiernik, TU Eindhoven
Graphical Gaussian models and their groups
Let $G$ be an undirected graph with $n$ nodes defining the graphical
Gaussian model $M(G)$ parametrized by the set $P(G)$ of all
concentration (symmetric, positive definite) matrices with zeros
corresponding to nonedges of $G$. We describe the stabilizer
in $GL_n(\mathbb{R})$ of the model, in the natural action of
$GL_n(\mathbb{R})$ on symmetric matrices. This group gives the
representation of graphical Gaussian models as composite transformation
families. This has important consequences for the study of the
maximum likelihood inference for this model class which I hope
to discuss in the second part of the talk.
This is a joint work with Jan Draisma and Sonja Kuhnt.
Back to top

