Machine Learning and Friends Lunch

Learning Relational Probability Trees

Abstract

Classification trees are widely used in the machine learning and data mining communities for modeling propositional data. Recent work has extended this basic paradigm to probability estimation trees. Tree models are often easily interpretable due to their intuitive representation of knowledge and selectivity, making them an attractive modeling approach for the knowledge discovery community. However, conventional tree learning algorithms were designed for data sets where the instances are homogeneous and statistically independent. In this talk we present an algorithm for learning relational probability trees (RPTs). RPTs extend standard probability estimation trees to a relational setting in which data instances are heterogeneous and interdependent. Our algorithm for learning the structure and parameters of an RPT searches over a space of relational features that use aggregation functions (e.g. average, mode, count) to dynamically flatten relational data and create dichotomous divisions within the RPT. Our recent work on relational learning has examined how particular characteristics of relational data affect the statistical inferences necessary for accurate learning. We have identified three characteristics of relational data -- concentrated linkage, degree disparity, and relational autocorrelation -- and have shown how these can greatly complicate efforts to construct good statistical models. The RPT algorithm uses a novel form of randomization test to adjust for these biases. On a variety of relational learning tasks, we will show that RPTs built using randomization tests are significantly smaller than other models and achieve equivalent, or better, performance.

Back to ML Lunch home