Machine Learning, , 1–37 ()
c© Kluwer Academic Publishers, Boston. Manufactured in The Netherlands.
Naive Bayesian Classification of Structured Data
PETER A. FLACH
Department of Computer Science, University of Bristol, United Kingdom
LSIIT, Université Louis Pasteur, Strasbourg, France
In this paper we present 1BC and 1BC2, two systems that perform naive Bayesian classification of
structured individuals. The approach of 1BC is to project the individuals along first-order features. These features
are built from the individual using structural predicates referring to related objects (e.g. atoms within molecules),
and properties applying to the individual or one or several of its related objects (e.g. a bond between two atoms).
We describe an individual in terms of elementary features consisting of zero or more structural predicates and
one property; these features are treated as conditionally independent in the spirit of the naive Bayes assumption.
1BC2 represents an alternative first-order upgrade to the naive Bayesian classifier by considering probability
distributions over structured objects (e.g., a molecule as a set of atoms), and estimating those distributions from
the probabilities of its elements (which are assumed to be independent). We present a unifying view on both
systems in which 1BC works in language space, and 1BC2 works in individual space. We also present a new,
efficient recursive algorithm improving upon the original propositionalisation approach of 1BC. Both systems
have been implemented in the context of the first-order descriptive learner Tertius, and we investigate the
differences between the two systems both in computational terms and on artificially generated data. Finally, we
describe a range of experiments on ILP benchmark data sets demonstrating the viability of our approach.
Probabilistic approaches to classification typically involve modelling the conditional prob-
ability distribution P(C