SAT-based Decision Tree Learning for Large Data Sets | Journal of Artificial Intelligence Research

PDF

Published: Jul 3, 2024

DOI: https://doi.org/10.1613/jair.1.15956

Keywords:

Boolean satisfiability, decision trees

Andre Schidler

TU Wien

https://orcid.org/0000-0001-6790-7158

Stefan Szeider

TU Wien

https://orcid.org/0000-0001-8994-1656

Abstract

Decision trees of low depth are beneficial for understanding and interpreting the data they represent. Unfortunately, finding a decision tree of lowest complexity (depth or size) that correctly represents given data is NP-hard. Hence known algorithms either (i) utilize heuristics that do not minimize the depth or (ii) are exact but scale only to small or medium-sized instances. We propose a new hybrid approach to decision tree learning, combining heuristic and exact methods in a novel way. More specifically, we employ SAT encodings repeatedly to local parts of a decision tree provided by a standard heuristic, leading to an overall reduction in complexity. This allows us to scale the power of exact SAT-based methods to comparatively very large data sets. We evaluate our new approach experimentally on a range of real-world instances that contain up to several thousand samples. In almost all cases, our method successfully decreases the complexity of the initial decision tree; often, the decrease is significant.

Issue

Vol. 80 (2024)

Section

Articles

afiliatedsites

JAIR is published by AI Access Foundation, a nonprofit public charity whose purpose is to facilitate the dissemination of scientific results in artificial intelligence. JAIR, established in 1993, was one of the first open-access scientific journals on the Web, and has been a leading publication venue since its inception. We invite you to check out our other initiatives.

Learn more

Article Sidebar

Main Article Content

Abstract

Article Details