Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems

Gabrys, Bogdan

Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems.

Tools

Gabrys, B., 2002. Combining Labelled and Unlabelled Data in the Design of Pattern Classification Systems. In: Hybrid Methods for Adaptive Systems (HMAS'2002) Workshop, 20 September 2002, Albufeira, Portugal.

Full text available as:

[thumbnail of Gabrys_Petrakieva_EUNITE2002.pdf]

Preview

PDF
Gabrys_Petrakieva_EUNITE2002.pdf - Published Version
82kB

Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk.

Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material.

Abstract

There has been much interest in applying techniques that incorporate knowledge from unlabelled data into a supervised learning system but less effort has been made to compare the effectiveness of different approaches on real world problems and to analyse the behaviour of the learning system when using different amount of unlabelled data. In this paper an analysis of the performance of supervised methods enforced by unlabelled data and some semisupervised approaches using different ratios of labelled to unlabelled samples is presented. The experimental results show that when supported by unlabelled samples much less labelled data is generally required to build a classifier without compromising the classification performance. If only a very limited amount of labelled data is available the results show high variability and the performance of the final classifier is more dependant on how reliable the labelled data samples are rather than use of additional unlabelled data. Semi-supervised clustering utilising both labelled and unlabelled data have been shown to offer most significant improvements when natural clusters are present in the considered problem.

Item Type:	Conference or Workshop Item (Paper)
Group:	Faculty of Science & Technology
ID Code:	8608
Deposited By:	INVALID USER
Deposited On:	21 Dec 2008 17:29
Last Modified:	14 Mar 2022 13:20

Downloads

Downloads per month over past year

More statistics for this item...

Repository Staff Only -