Budka, M., 2010. Physically inspired methods and development of data-driven predictive systems. Doctoral Thesis (Doctoral). Bournemouth University.
Full text available as:
|
PDF
Budka,_Marcin_Ph.D._2010.pdf 8MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Abstract
Traditionally building of predictive models is perceived as a combination of both science and art. Although the designer of a predictive system effectively follows a prescribed procedure, his domain knowledge as well as expertise and intuition in the field of machine learning are often irreplaceable. However, in many practical situations it is possible to build well–performing predictive systems by following a rigorous methodology and offsetting not only the lack of domain knowledge but also partial lack of expertise and intuition, by computational power. The generalised predictive model development cycle discussed in this thesis is an example of such methodology, which despite being computationally expensive, has been successfully applied to real–world problems. The proposed predictive system design cycle is a purely data–driven approach. The quality of data used to build the system is thus of crucial importance. In practice however, the data is rarely perfect. Common problems include missing values, high dimensionality or very limited amount of labelled exemplars. In order to address these issues, this work investigated and exploited inspirations coming from physics. The novel use of well–established physical models in the form of potential fields, has resulted in derivation of a comprehensive Electrostatic Field Classification Framework for supervised and semi–supervised learning from incomplete data. Although the computational power constantly becomes cheaper and more accessible, it is not infinite. Therefore efficient techniques able to exploit finite amount of predictive information content of the data and limit the computational requirements of the resource–hungry predictive system design procedure are very desirable. In designing such techniques this work once again investigated and exploited inspirations coming from physics. By using an analogy with a set of interacting particles and the resulting Information Theoretic Learning framework, the Density Preserving Sampling technique has been derived. This technique acts as a computationally efficient alternative for cross–validation, which fits well within the proposed methodology. All methods derived in this thesis have been thoroughly tested on a number of benchmark datasets. The proposed generalised predictive model design cycle has been successfully applied to two real–world environmental problems, in which a comparative study of Density Preserving Sampling and cross–validation has also been performed confirming great potential of the proposed methods.
Item Type: | Thesis (Doctoral) |
---|---|
Additional Information: | If you feel that this work infringes your copyright please contact the BURO Manager. |
Group: | Faculty of Science & Technology |
ID Code: | 17518 |
Deposited By: | INVALID USER |
Deposited On: | 16 Mar 2011 09:41 |
Last Modified: | 09 Aug 2022 16:03 |
Downloads
Downloads per month over past year
Repository Staff Only - |