On accuracy of PDF divergence estimators and their applicability to representative data sampling.

Tools

Budka, M., Gabrys, B. and Musial, K., 2011. On accuracy of PDF divergence estimators and their applicability to representative data sampling. Entropy, 13 (6), 1229-1266.

Full text available as:

Preview

PDF
entropy-13-01229.pdf - Published Version
4MB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

Official URL: http://www.mdpi.com/1099-4300/13/7/1229/

DOI: 10.3390/e13071229

Abstract

Generalisation error estimation is an important issue in machine learning. Cross-validation traditionally used for this purpose requires building multiple models and repeating the whole procedure many times in order to produce reliable error estimates. It is however possible to accurately estimate the error using only a single model, if the training and test data are chosen appropriately. This paper investigates the possibility of using various probability density function divergence measures for the purpose of representative data sampling. As it turned out, the first difficulty one needs to deal with is estimation of the divergence itself. In contrast to other publications on this subject, the experimental results provided in this study show that in many cases it is not possible unless samples consisting of thousands of instances are used. Exhaustive experiments on the divergence guided representative data sampling have been performed using 26 publicly available benchmark datasets and 70 PDF divergence estimators, and their results have been analysed and discussed.

Item Type:	Article
ISSN:	1099-4300
Uncontrolled Keywords:	cross-validation; divergence estimation; generalisation error estimation; Kullback-Leibler divergence; sampling
Group:	Faculty of Science & Technology
ID Code:	18405
Deposited By:	Dr Marcin Budka
Deposited On:	11 Aug 2011 11:28
Last Modified:	14 Mar 2022 13:39

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -