Discovering Bayesian Networks in Incomplete Databases
Bayesian Belief Networks (BBNs) are becoming increasingly popular in the Knowledge Discovery and Data Mining community. A BBN is defined by a graphical structure of conditional dependencies among the domain variables and a set of probability distributions defining these dependencies. In this way, BBNs provide a compact formalism - grounded in the well-developed mathematics of probability theory - able to predict variable values, explain observations, and visualize dependencies among variables. During the past few years, several efforts have been addressed to develop methods able to extract both the graphical structure and the conditional probabilities of a BBN from a database. All these methods share the assumption that the database at hand is complete, that is, it does not report any entry as unknown. When this assumption fails, these methods have to resort to expensive iterative procedures which are infeasible for large databases. This paper describes a new Knowledge Discovery system based on an efficient method able to extract the graphical structure and the probability distributions of a BBN from possibly incomplete databases. An application using a large real-world database will illustrate methods and concepts underlying the system and will assess its advantages as a Knowledge Discovery system.
1. Knowledge Media Institute, The Open University.
2. Department of Actuarial Science and Statistics, City University.