He, H., Tiwari, A., Mehnen, J., Watson, T., Maple, C., Jin, Y. and Gabrys, B., 2016. Incremental Information Gain Analysis of Input Attribute Impact on RBF-Kernel SVM Spam Detection. In: 2016 IEEE Congress on Evolutionary Computation (IEEE CEC), 24-29 July 2016, Vancouver, Canada.
Full text available as:
|
PDF
He_et_al_SVM_Spam_Detection_CEC_2016.pdf - Accepted Version Available under License Creative Commons Attribution Non-commercial No Derivatives. 289kB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
Abstract
The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. Email spams can be detected through detecting senders’ behaviour, the contents of an email, subject and source address, etc, while SMS spam detection usually is based on the tokens or features of messages due to short content. However, a comprehensive analysis of email/SMS content may provide cures for users to aware of email/SMS spams. We cannot completely depend on automatic tools to identify all spams. In this paper, we propose an analysis approach based on information entropy and incremental learning to see how various features affect the performance of an RBF-based SVM spam detector, so that to increase our awareness of a spam by sensing the features of a spam. The experiments were carried out on the spambase and SMSSpemCollection databases in UCI machine learning repository. The results show that some features have significant impacts on spam detection, of which users should be aware, and there exists a feature space that achieves Pareto efficiency in True Positive Rate and True Negative Rate.
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Group: | Faculty of Science & Technology |
ID Code: | 24678 |
Deposited By: | Symplectic RT2 |
Deposited On: | 26 Sep 2016 13:16 |
Last Modified: | 14 Mar 2022 13:58 |
Downloads
Downloads per month over past year
Repository Staff Only - |