## PROGRAMS AND ACTIVITIES

September 26, 2017

THE FIELDS INSTITUTE
FOR RESEARCH IN MATHEMATICAL SCIENCES
 April 16-18, 2012 Workshop on Graphical Models: Mathematics, Statistics and Computer Sciences Abstracts

Wicher Bergsma, London School of Economics

Marginal models: some interesting theoretical problems arising from applications
Marginal models, which can be used to deal with nuisance dependencies in data, are ubiquitous in applied statistics. For example, graphical models typically impose conditional independence restrictions on marginal distributions. This talk discusses some of the theoretical problems that arise when analyzing real-world data. These relate to finding the correct dimension of a model and detecting whether or not the model is smooth.

Frank Critchley (Open University, UK) and Paul Marriott (Waterloo, Canada)

Computational Information Geometry and Graphical Models
Recent developments in information geometry, with a distinctive computational focus, are briefly reviewed together with their potential applications in graphical models (and, more widely, in statistical inference). In addition to algebraic geometric formulations, the differential geometric structure of extended exponential families is incorporated. In particular, computation implications of different forms of statistical curvature, and also boundary effects, are investigated. The computational aspects of this talk focus on statistical and numerical issues. Theoretical developments are illustrated through well-known graphical model examples.

Robin Evans, University of Cambridge

Constraints on Marginalized DAGs
Models defined by the global Markov property for directed acyclic graphs (DAGs) possess many nice properties, including a simple factorization, graphical criteria for determining independences (d-separation/moralization), and computational tractability. As is well known, however, DAG models are not closed under the operation of marginalization, so marginalized DAGs (mDAGs) describe a much larger and less well understood class of models.
mDAG models can be represented by graphs with directed and bidirected edges, under a modest extension of Richardson's ADMGs. We describe the natural Markov property for mDAGs, which imposes observable conditional independences, 'dormant' independences (Verma constraints) and inequality constraints, and provide some results towards its characterization. In particular we describe the Markov equivalence classes of mDAG models over 3 observed variables. We also give a constructive proof that when 2 observed variables are not joined by any edge (directed or bidirected), then this always induces some constraint on the observed joint distribution.

Stephen E., Fienberg, Department of Statistics, Carnegie Mellon University}

Graphical Models for Network Data
Graphs play an important role as a representation for two dual representations of statistical models---one where the nodes correspond to units and the edges to variables that relate them to on another, and the other where the nodes are variables and the edges represent relationships among them. In the former units are inherently dependent and relationships may or may not be, whereas in the later the units are inherently independent and the focus is on independence relationships among the variables. In this talk I describe both types of representations involving discrete relationships or variables, and there interplay, focusing on likelihood estimation for Exponential Random Graph Models.

Thomas Kahle, ETH Zürich

Positive margins and primary decompositions
In algebraic statistics one uses Markov bases to assess the goodness of fit in graphical and other log-linear models. Even if Markov bases are known, for many models they are too big to be practical. We study the consequences of replacing a Markov basis of a graphical model by a simple set of moves coming from conditional independence statements. While those moves do not connect all fibers, they may still be sufficient to connect all fibers satisfying certain linear constraints on the margins, for example strict positivity. This is the positive margins property and it can be studied systematically using decompositions of binomial conditional independence ideals. This method allows to clarify the positive margins property for graphical models of decomposable graphs, cycles, and some complete bipartite graphs. (Joint work with Johannes Rauh and Seth Sullivant)

Steffen Lauritzen, University of Oxford

Graphical Gaussian Models with Symmetry and Unknown Means
Generally it is non-trivial to estimate the mean of Gaussian distribution under linear restrictions when the covariance structure is given by a graphical model with symmetry, a well-known example being the Behrens-Fisher problem corresponding to a graph with two nodes and no edge for the covariance structure, with the means associated with the two nodes being different. We investigate the problem of estimating the means under equality constraints in a graphical model with symmetry and establish that the solution to this problem is simple if and only if the partitioning of the variables according to their means is equitable with respect to the edge colouring of the symmetry models and is finer than the partitioning induced by the vertex colouring. Examples and consequences for design are briefly discussed. This represents joint work with Helene Gehrmann [Gehrmann, H. And Lauritzen, S. 2011. Estimation of Means in Graphical Gaussian Models with Symmetries. arXiv:1101.3709. To appear in the Annals of Statistics.]

Grard Letac,Institut de Mathmatiques de Toulouse

Characteristic functions of convex sets and applications to contingency tables
A contingency table starts from the data of finite sets $I_v$ where $v\in V$ and where $V$ is finite, and from a Bernoulli probability $(p(i))$ on $I=\prod_{v\in I}I_v.$ More specifically consider $X_1,\ldots,X_N$ which are iid and valued in $\mathbb{R}^I$ (where $(e_i)_{i\in I}$ as the canonical basis of $\mathbb{R}^I$) such that
$\Pr(X_1=e_i)=p(i)$. The random array of integers $S_N=X_1+\cdots+X_N$ is the contingency table. A hierarchical model for the contingency table is given by the data of a family $\mathcal{D}$ of non empty subsets $D$ of $V$ such that $D_1 \subset D$ and $D\in \mathcal{D}$ imply $D_1\in \mathcal{D}.$ The model associated to $\mathcal{D}$ is the set of all probabilities $(p(i))_{i\in I}$ such that $p(i)>0$ for all i\inI and such that there exist functions $i\mapsto \lambda_D(i)$ depending only on $i_v$ such that $v\in D$ satisfying $\log p(i)=\sum_{D}\lambda_D(i).$ This is actually an exponential family. Example: choose $V=\{a,b,c\}$ and $\mathcal{D}$ equal to $\{ab,bc,a,b,c\}.$ That means that we restrict ourselves to probabilities of the form $p(i_a,i_b,i_c)=q(i_a,i_b)r(i_b,i_c).$

Having observed the contingency table $S_N,$ deciding whether the model $\mathcal{D}$ is the good one or not is a difficult problem. We attack it by the Bayesian factor method, which compares two models $\mathcal{D}$ and $\mathcal{D}'.$ The tool for this is a not very well known object, namely the characteristic function $m\mapsto J_C(m)$ on a bounded (say) convex set $C$. It can be defined in a number of ways whose the shortest (but not the most transparent) is the volume of the polar set of $C-m.$ A better way to introduce $J_C$ here would be through the Diaconis-Ylvizaker integral of an exponential family. To each model $\mathcal{D}$ we attach the convex polytope $C$ of the support of the exponential family above, and we compute $A=J_C(m(S_N/N))$ where $s\mapsto m(s)$ is a canonical projection associated to $\mathcal{D}.$ The ratio of the numbers $A$ and $A'$ corresponding to models $\mathcal{D}$ and $\mathcal{D}'$ is the essential ingredient for the computation of the Bayesian factor. I will say more about the statistical uses of the characteristic function $J_C,$ specially when the hierarchical model is a graphical model attached to a triangulated graph.

Luigi Malagò, Department of Electronics and Information, Politecnico di Milano

Graphical Models and Model-Based Search Algorithms
Model-based Search (MBS) is a paradigm in stochastic optimization where the search for the minimum of a function is guided by a statistical model, i.e., a set of probability distributions defined over the search space. The optimum is obtained by sampling from a sequence of probability distributions that minimize the expected value of the function, estimated from a sample of points. The framework is rather general and includes algorithms from Evolutionary Computation and Stochastic Optimization. In this talk we focus on Estimation of Distribution Algorithms (EDAs) which generate sequences of distributions by iteratively selecting a subsample according to the value of the function, estimating the parameters of a distribution, and then sampling a new set of points. Most EDAs use factorizations of the joint probability distribution based on probabilistic graphical models. Two popular examples are BOA, which employs Bayesian Networks, and DEUM, based on Markov Random Fields. In black-box contexts, when the analytic formula of function to be optimized in unknown, it becomes crucial to estimate a good model for the function, able to capture the correlations among the variables, to increase the probability of finding the global optimum. For this reason, EDAs, and more in general MBS algorithms, need efficient algorithms for model selection, to be applied in the high dimensional setting. We present a review of the current state of the art, together with main issues and recent advances, about the use of graphical models in EDA, with particular focus on Markov Random Fields.

Sofia Massa, University of Oxford

Estimating combination of graphical models
In some recent applications, the interest is in combining information about relationships between variables from independent studies performed under partially comparable circumstances. In this talk weintroduce some motivating examples of the research question, present some relevant types of combinations and discuss issues related to the estimation of the parameters of the graphical models combination.

Frantisek Matús,Institute of Information Theory and Automation,Academy of Sciences of the Czech Republic}

Around boundaries of exponential families
Limiting in the standard exponential families of probability measures on Euclidean spaces will be reviewed, in particular, when the mean goes along a segment inside the convex support of such a family towards an endpoint on the boundary. First-order behavior of the corresponding probability measures, information divergences and variance functions can be described when the family has finite support. This has implications on maximization of distances from exponential families and on the problem of classification of exponential families with special variance functions.

Mehdi Molkaraie, Department of Information Technology and Electrical Engineering, ETH Zurich

Importance Sampling to Estimate the Entropy Function in Graphical Models with Cycles"
We propose a multi-layer importance sampling algorithm to estimate the entropy of the output process of noisy source/channel models. We focus on graphical models defined in terms of factor graphs and carry out computations using a double-loop algorithm on the model. Numerical results for two-dimensional noisy source/channel models with non-trivial Markov structure are presented. Based on joint work with Hans-Andrea Loeliger.

Guido Montufar, Pennsylvania State University, USA

The number of layers in a Deep Belief Network
Graphical models with several levels of unobserved variables can represent certain probability distributions way more compactly
than models with fewer levels of unobserved variables. The most accepted picture is that, depending on the target distributions, there is a right choice of the number of levels and the number of variables in each level. In this talk I discuss this tradeoff of hyperparameters for the specific class of models deep belief network.

Bala Rajaratnam
, Stanford University

Positive definite completion problems for directed acyclic graphs
A positive definite completion problem pertains to determining whether the unspecified positions of a partial (or incomplete) matrix can be completed in a desired subclass of positive definite matrices. In this talk we consider an important and new class of positive definite completion problems where the desired subclasses are the spaces of covariance and inverse covariance matrices of probabilistic models corresponding to directed acyclic graph models (also known as Bayesian networks). We provide fast procedures that determine whether a partial matrix can be completed in either of these spaces and thereafter proceed to construct the completed matrices. We prove an analog of the positive definite completion result for undirected graphs in the context of directed acyclic graphs, and thus proceed to characterize the class of DAGs which can always be completed. We also proceed to give closed form expressions for the inverse and the determinant of a completed matrix as a function of only the elements of the corresponding partial matrix.

Johannes Rauh, Max Planck Institute for Mathematics in the Sciences, Leipzig

The expressive power of mixture models and Restricted Boltzmann Machines
Mixture models and Restricted Boltzmann Machines (RBMs) are graphical models with hidden nodes that are widely used in statistical modelling and machine learning. They are semi-algebraic sets with a complicated geometry, and a complete description in terms of polynomial equalities and inequalities is unknown. Recent results show that in many cases the dimension of the model equals the expected dimension. Nevertheless, even if the model has the full dimension, there may be large sets of probability distributions that lie outside of the model. This shows that the dimension alone does not suffice to describe the expressive power of these models. In my talk I want to discuss results of a collaboration with Guido Montúfar about sets of probability distributions that RBMs and mixture models can or cannot approximate well.

Alessandro Rinaldo, Carnegie Mellon University

Maximum Likelihood Estimation in Log-Linear Models
We study maximum likelihood estimation in log-linear models under conditional Poisson sampling schemes. We derive necessary and sufficient conditions for existence of the maximum likelihood estimator (MLE) of the model parameters and investigate estimability of the natural and mean-value parameters under a non-existent MLE. Our conditions focus on the role of sampling zeros in the observed table. We situate our results within the general framework of extended exponential families and we rely in a fundamental way on key geometric properties of log-linear models. We propose algorithms for extended maximum likelihood estimation that improve and correct existing algorithms for log-linear model analysis used in virtually all standard statistical software packages. (Joint work with Stephen Fienberg.)

Stable Mixed Graphs
In this talk we discuss classes of graphs with three types of edge that capture the modified independence structure of a directed acyclic graph (DAG) after marginalisation over unobserved variables and conditioning on selection variables using the $m$-separation criterion. These include MC, summary, and ancestral graphs. As a modification of MC graphs we define the class of ribbonless graphs (RGs) that permits the use of the $m$-separation criterion. RGs contain summary and ancestral graphs, and each RG can be generated by a DAG after marginalisation and conditioning. We derive simple algorithms to generate RGs from given DAGs or RGs, and also to generate summary and ancestral graphs in a simple way by further extension of the RG-generating algorithm. By using these algorithms we develop a parallel theory on the three classes, and study the relationships between them as well as the use of each class.

Milan Studeny, Academy of Sciences of the Czech Republic/Institute of Information Theory and Automation}

Integer programming approach to statistical learning graphical models

At first, it will be explained how the statistical task of learning Bayesian network structure, which is a special graphical model of conditional independence structure, can be transformed to a linear optimization task to maximize a certain linear function (determined by data) over a special integral polytope. Then suitable linear transformation of the optimization problem will be described, which allows to consider a polytope with zero-one vectors as vertices. Finally, by means of the concept of LP relaxation of the polytope, the transformed optimization task will be re-formulated in the form an integer programming problem, which seems to offer the way to solve the optimization task in practice.
The talk is based on joint research with Raymond Hemmecke, Silvia Lindner (both TU Munich) and David Haws (University of Kentucky).

Seth Sullivant,North Carolina State University

Identifiability of Large Phylogenetic Mixture Models
Phylogenetic mixture models are statistical models of character evolution allowing for heterogeneity. Each of the classes in some unknown partition of the characters may evolve by different processes, or even along different trees. The fundamental question of whether parameters of such a model are identifiable is difficult to address, due to the complexity of the parameterization. We analyze mixture models on large trees, with many mixture components, showing that both numerical and tree parameters are indeed identifiable in these models when all trees are the same. The techniques we employ come from the algebraic geometry of tensors and are likely to be useful for other hidden variable graphical models. This is joint work with John Rhodes (U. of Alaska--Fairbanks)

Henry Wynn, London School of Economics, UK

Hierarchical models and monomial ideals
The connections between graphical models and certain algebraic structures, under a heading of algebraic statistics", for example the link with toric ideals, is well-established. The links have primarily been made for Poisson/multinomial models and for the Gaussian case. Here it is shown how to introduce algebraic concepts into quite general hierarchical and graphical models. The concept of a local (differential) cumulant is developed and related to hierarchical models, covering independence and conditional independence in a natural way. Then, the isomorphism between a type of differential ideal and a monomial ideal is used to complete the connection.
This allows a translation between certain algebraic properties such as {\em 2-linear} and {\em shellable} and conditions for hierarchical models and with statistical interpretations such as decomposability and the lattice conditional independence (LCI) property. In theory, interesting classes of ideal should thus lead to interesting, and possible new, classes of statistical models. A few examples are given. The differential ideal property also leads to special families of distributions.

Jing Xi, University of Kentucky

Estimating the number of zero-one multi-way tables via sequential importance sampling
In 2005, Chen et al introduced a sequential importance sampling (SIS) procedure to analyze zero-one two-waytables with given fixed marginal sums (row and column sums) via the conditional Poisson (CP) distribution. They showed that compared with Monte Carlo Markov chain (MCMC)-based approaches, their importance sampling method is more efficient in terms of running time and also provides an easy and accurate estimate of the total number of contingency tables with fixed marginal sums. In this talk we extend their result to zero-one multi-way ($d$-way, $d \geq 2$) contingency tables under the no $d$-way interaction model, i.e., with fixed $d - 1$ marginal sums. Also we show by simulations that the SIS procedure with CP distribution to estimate the number of zero-one three-way tables under the no three-way interaction model given marginal sums works very well even with some rejections. We also applied our method to Samson's monks' data set. We end with further questions on the SIS procedure on zero-one multi-way tables.

Ruriko Yoshida, University of Kentucky

Normality of the three-state toric homogeneous Markov chain model
A discrete time Markov chain, a stochastic process with Markov property, has been applied in many fields, such as physics, chemistry, information sciences, economics, finances, mathematical biology, social sciences, and statistics. A discrete time Markov chain is often used as a statistical model from a random physical process to fit the observe the data. A time-homogeneous Markov chain is a process that each transition probability from a state to a state does not depend on time. It is important to test if the observed data set fits the assumption of the time-homogeneity of the chain and in 2011, Hara and Takemura suggested a Markov Chain Monte Carlo (MCMC) approach to a goodness-of-fit test for the time-homogeneity of the chain using Markov bases. In this talk we focus on a time-homogeneous Markov chain (THMC) model with three states and without any self-loops, i.e., the probability from a state to itself is zero, and we study algebraic and polyhedral geometric properties of Markov bases for the THMC model with any time $T > 0$. More specifically, we showed the number of facets of the convex hull generated by the columns of the design matrix for the three-state THMC model without loops for any time $T \geq 5$ is $24$, and we showed its explicit hyperplane representation of the convex hull for any $T \geq 4$. Using this result we then showed that the semigroup generated by the columns of the design matrix is normal for any $T > 0$ and finally we showed that a Markov basis for the toric ideal associate with the design matrix for the three-state THMC model without loops consists of binomials of degree less than or equal to $d= 6$. We end this talk with conjectures on further analyses on Markov bases for the THMC model.

This is joint work with D. Haws, A., Mart\'in del Campo, and A. Takemura.

Piotr Zwiernik, TU Eindhoven

Graphical Gaussian models and their groups
Let $G$ be an undirected graph with $n$ nodes defining the graphical Gaussian model $M(G)$ parametrized by the set $P(G)$ of all concentration (symmetric, positive definite) matrices with zeros corresponding to non-edges of $G$. We describe the stabilizer in $GL_n(\mathbb{R})$ of the model, in the natural action of $GL_n(\mathbb{R})$ on symmetric matrices. This group gives the representation of graphical Gaussian models as composite transformation families. This has important consequences for the study of the maximum likelihood inference for this model class which I hope to discuss in the second part of the talk.
This is a joint work with Jan Draisma and Sonja Kuhnt.