Logo
Logo

Goldsmiths - University of London

PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches

Stamate, Daniel; Alghamdi, Wajdi; Stahl, Daniel; Logofatu, Doina and Zamyatin, Alexander. 2018. PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches. In: , ed. Artificial Intelligence Applications and Innovations. 519 Springer, pp. 273-284. [Book Section]

[img]
Preview
Text
CameraReady_paper73.pdf - Accepted Version
Available under License Creative Commons Attribution Non-commercial.

Download (650kB) | Preview

Abstract or Description

In the process of constructing a decision tree, the criteria for selecting the splitting attributes influence the performance of the model produced by the decision tree algorithm. The most well-known criteria such as Shannon entropy and Gini index, suffer from the lack of adaptability to the datasets. This paper presents novel splitting attribute selection criteria based on some families of pa-rameterised impurities that we proposed here to be used in the construction of optimal decision trees. These criteria rely on families of strict concave functions that define the new generalised parameterised impurity measures which we ap-plied in devising and implementing our PIDT novel decision tree algorithm. This paper proposes also the S-condition based on statistical permutation tests, whose purpose is to ensure that the reduction in impurity, or gain, for the selected attrib-ute is statistically significant. We implemented the S-pruning procedure based on the S-condition, to prevent model overfitting. These methods were evaluated on a number of simulated and benchmark datasets. Experimental results suggest that by tuning the parameters of the impurity measures and by using our S-pruning method, we obtain better decision tree classifiers with the PIDT algorithm.

Item Type: Book Section

Identification Number (DOI):

https://doi.org/10.1007/978-3-319-92007-8_24

Keywords:

Machine Learning, Decision trees, Parameterised impurity measures, Concave functions, Optimisation, Preventing overfitting, Statistical pruning, Permutation test, Significance level

Departments, Centres and Research Units:

Computing

Dates:

DateEvent
8 March 2018Accepted
22 May 2018Published

Item ID:

24124

Date Deposited:

14 Sep 2018 15:07

Last Modified:

14 Sep 2018 15:07

URI: http://research.gold.ac.uk/id/eprint/24124

View statistics for this item...

Edit Record Edit Record (login required)