Cheriton School of Computer Science
University of Waterloo
Several important tasks in machine learning can be formulated as combinatorial optimization problems. For example, learning the structure of a Bayesian network from data can be formulated as a combinatorial optimization problem, where a score is defined that measures how well a candidate structure is supported by the observed data and the task is to find the structure with the lowest score. As a second example, learning a decision tree from labeled data can be formulated as a combinatorial optimization problem, where the goal is to find the decision tree that best fits the data subject to regularization constraints. Both of these problems are NP-Hard. Our approach is to apply constraint programming and other advanced search techniques.
(PDF) Peter van Beek and Hella-Franziska Hoffmann. Machine learning of Bayesian networks using constraint programming. Proceedings of CP 2015, Cork, Ireland, August, 2015.
Bayesian networks are a widely used graphical model with diverse applications in knowledge discovery, classification, prediction, and control. Learning a Bayesian network from discrete data can be cast as a combinatorial optimization problem and there has been much previous work on applying optimization techniques including proposals based on ILP, A* search, depth-first branch-and-bound (BnB) search, and breadth-first BnB search. In this paper, we present a constraint-based depth-first BnB approach for solving the Bayesian network learning problem. We propose an improved constraint model that includes powerful dominance constraints, symmetry-breaking constraints, cost-based pruning rules, and an acyclicity constraint for effectively pruning the search for a minimum cost solution to the model. We experimentally evaluated our approach on a representative suite of benchmark data. Our empirical results compare favorably to the best previous approaches, both in terms of number of instances solved within specified resource bounds and in terms of solution time.
Table 1. For each benchmark set of discrete data, time (seconds) to determine minimal cost BN using various systems (see text) and the BDeu and BIC scoring methods, where n is the number of random variables in the data set, N is the number of instances in the data set, and d is the total number of possible parents for the random variables. Resource limits of 24 hours of CPU time and 16 GB of memory were imposed: OM = out of memory; OT = out of time. A blank entry indicates that the preprocessing step of obtaining the local scores for each random variable could not be completed within the resource limits.
(PDF) Colin Lee and Peter van Beek. Metaheuristics for Score-and-Search Bayesian Network Structure Learning. Proceedings of the 30th Canadian Conference on Artificial Intelligence, Edmonton, Alberta, May, 2017.
Structure optimization is one of the two key components of score-and-search based Bayesian network learning. Extending previous work on ordering-based search (OBS), we present new local search methods for structure optimization which scale to upwards of a thousand variables. We analyze different aspects of local search with respect to OBS that guided us in the construction of our methods. Our improvements include an efficient traversal method for a larger neighbourhood and the usage of more complex metaheuristics (iterated local search and memetic algorithm). We compared our methods against others using test instances generated from real data, and they consistently outperformed the state of the art by a significant margin.
Code and data sets for the results described in the paper: https://github.com/kkourin/mobs
Table 1: Median score of best networks found, for various benchmarks, where n is the number of variables and C is the total number of candidate parent sets. The column labelled INOBS represents the scores of INOBS with random restarts. OM indicates the solver runs out of memory before any solution is output. Bold indicates the score was the best found amongst all tested methods. An asterisk indicates that the score is known to be optimal