Eastwood, M., 2010. Building well-performing classifier ensembles: model and decision level combination. PhD Thesis (PhD). Bournemouth University.
Full text available as:
There is a continuing drive for better, more robust generalisation performance from classification systems, and prediction systems in general. Ensemble methods, or the combining of multiple classifiers, have become an accepted and successful tool for doing this, though the reasons for success are not always entirely understood. In this thesis, we review the multiple classifier literature and consider the properties an ensemble of classifiers - or collection of subsets - should have in order to be combined successfully. We find that the framework of Stochastic Discrimination provides a well-defined account of these properties, which are shown to be strongly encouraged in a number of the most popular/successful methods in the literature via differing algorithmic devices. This uncovers some interesting and basic links between these methods, and aids understanding of their success and operation in terms of a kernel induced on the training data, with form particularly well suited to classification. One property that is desirable in both the SD framework and in a regression context, the ambiguity decomposition of the error, is de-correlation of individuals. This motivates the introduction of the Negative Correlation Learning method, in which neural networks are trained in parallel in a way designed to encourage de-correlation of the individual networks. The training is controlled by a parameter λ governing the extent to which correlations are penalised. Theoretical analysis of the dynamics of training results in an exact expression for the interval in which we can choose λ while ensuring stability of the training, and a value λ∗ for which the training has some interesting optimality properties. These values depend only on the size N of the ensemble. Decision level combination methods often result in a difficult to interpret model, and NCL is no exception. However in some applications, there is a need for understandable decisions and interpretable models. In response to this, we depart from the standard decision level combination paradigm to introduce a number of model level combination methods. As decision trees are one of the most interpretable model structures used in classification, we chose to combine structure from multiple individual trees to build a single combined model. We show that extremely compact, well performing models can be built in this way. In particular, a generalisation of bottom-up pruning to a multiple-tree context produces good results in this regard. Finally, we develop a classification system for a real-world churn prediction problem, illustrating some of the concepts introduced in the thesis, and a number of more practical considerations which are of importance when developing a prediction system for a specific problem.
|Item Type:||Thesis (PhD)|
|Additional Information:||If you feel that this work infringes your copyright please contact the BURO Manager.|
|Subjects:||Generalities > Computer Science and Informatics|
|Group:||Faculty of Science and Technology|
|Deposited By:||Mrs Jill Burns|
|Deposited On:||03 Aug 2011 13:18|
|Last Modified:||10 Sep 2014 15:52|
Document DownloadsMore statistics for this item...
|Repository Staff Only -|
|BU Staff Only -|
|Help Guide -||Editing Your Items in BURO|