Cheriton School of Computer Science

University of Waterloo

Several important tasks in machine learning can be formulated as combinatorial optimization problems. For example, learning the structure of a Bayesian network from data can be formulated as a combinatorial optimization problem, where a score is defined that measures how well a candidate structure is supported by the observed data and the task is to find the structure with the lowest score. As a second example, learning a decision tree from labeled data can be formulated as a combinatorial optimization problem, where the goal is to find the decision tree that best fits the data subject to regularization constraints. Both of these problems are NP-Hard. Our approach is to apply constraint programming and other advanced search techniques.

(PDF)
Peter van Beek and Hella-Franziska Hoffmann.
Machine learning of Bayesian networks using
constraint programming.
*Proceedings of CP 2015,*
Cork, Ireland, August, 2015.

Bayesian networks are a widely used graphical model
with diverse applications in knowledge discovery,
classification, prediction, and control. Learning a
Bayesian network from discrete data can be cast as a
combinatorial optimization problem and there has been
much previous work on applying optimization techniques
including proposals based on ILP, A* search, depth-first
branch-and-bound (BnB) search, and breadth-first BnB
search. In this paper, we present a *constraint-based*
depth-first BnB approach for solving the Bayesian network
learning problem. We propose an improved constraint
model that includes powerful dominance constraints,
symmetry-breaking constraints, cost-based pruning rules,
and an acyclicity constraint for effectively pruning
the search for a minimum cost solution to the model. We
experimentally evaluated our approach on a representative
suite of benchmark data. Our empirical results compare
favorably to the best previous approaches, both in terms
of number of instances solved within specified resource
bounds and in terms of solution time.

Code and data sets for the results described in the paper: README.txt, CPBayes.zip, Scores.zip

Table 1.
For each benchmark set of discrete data,
time (seconds) to determine minimal cost BN using
various systems (see text) and the BDeu and BIC scoring methods,
where
*n* is the number of random variables in the data set,
*N* is the number of instances in the data set, and
*d* is the total number of possible parents for the
random variables.
Resource limits of 24 hours of CPU time and 16 GB of memory were
imposed: OM = out of memory; OT = out of time.
A blank entry indicates that the preprocessing step of obtaining
the local scores for each random variable could not be completed
within the resource limits.

(PDF)
Colin Lee and Peter van Beek.
Metaheuristics for Score-and-Search
Bayesian Network Structure Learning.
*Proceedings of the 30th Canadian Conference on
Artificial Intelligence,*
Edmonton, Alberta, May, 2017.

Structure optimization is one of the two key components of score-and-search based Bayesian network learning. Extending previous work on ordering-based search (OBS), we present new local search methods for structure optimization which scale to upwards of a thousand variables. We analyze different aspects of local search with respect to OBS that guided us in the construction of our methods. Our improvements include an efficient traversal method for a larger neighbourhood and the usage of more complex metaheuristics (iterated local search and memetic algorithm). We compared our methods against others using test instances generated from real data, and they consistently outperformed the state of the art by a significant margin.

Code and data sets for the results described in the paper: https://github.com/kkourin/mobs

Table 1: Median score of best networks found, for various benchmarks, where *n* is
the number of variables and *C* is the total number of candidate parent sets. The
column labelled INOBS represents the scores of INOBS with random restarts.
OM indicates the solver runs out of memory before any solution is output. Bold
indicates the score was the best found amongst all tested methods. An asterisk
indicates that the score is known to be optimal