Pandove, D. and Malhi, A., 2021. A Correlation Based Recommendation System for Large Data Sets. Journal of Grid Computing, 19 (4), 42.
Full text available as:
|
PDF (OPEN ACCESS ARTICLE)
Pandove-Malhi2021_Article_ACorrelationBasedRecommendatio.pdf - Published Version Available under License Creative Commons Attribution. 3MB | |
Copyright to original material in this document is with the original owner(s). Access to this content through BURO is granted on condition that you use it only for research, scholarly or other non-commercial purposes. If you wish to use it for any other purposes, you must contact BU via BURO@bournemouth.ac.uk. Any third party copyright material in this document remains the property of its respective owner(s). BU grants no licence for further use of that third party material. |
DOI: 10.1007/s10723-021-09585-9
Abstract
Correlation determination brings out relationships in data that had not been seen before and it is imperative to successfully use the power of correlations for data mining. In this paper, we have used the concepts of correlations to cluster data, and merged it with recommendation algorithms. We have proposed two correlation clustering algorithms (RBACC and LGBACC), that are based on finding Spearman’s rank correlation coefficient among data points, and using dimensionality reduction approach (PCA) along with graph theory respectively, to produce high quality hierarchical clusters. Both these algorithms have been tested on real life data (New York yellow cabs dataset taken from http://www.nyc.gov), using distributed and parallel computing (Spark and R). They are found to be scalable and perform better than the existing hierarchical clustering algorithms. These two approaches have been used to replace similarity measures in recommendation algorithms and generate a correlation clustering based recommendation system model. We have combined the power of correlation analysis with that of prediction analysis to propose a better recommendation system. It is found that this model makes better quality recommendations as compared to the random recommendation model. This model has been validated using a real time, large data set (MovieLens dataset, taken from http://grouplens.org/datasets/movielens/latest). The results show that combining correlated points with the predictive power of recommendation algorithms, produce better quality recommendations which are faster to compute. LGBACC has approximately 25% better prediction capability but at the same time takes significantly more prediction time compared to RBACC..
Item Type: | Article |
---|---|
ISSN: | 1570-7873 |
Uncontrolled Keywords: | Correlation clustering ; Recommendation system model ; RBACC ; LGBACC |
Group: | Faculty of Science & Technology |
ID Code: | 36176 |
Deposited By: | Symplectic RT2 |
Deposited On: | 02 Nov 2021 12:21 |
Last Modified: | 14 Mar 2022 14:30 |
Downloads
Downloads per month over past year
Repository Staff Only - |