电子科技 ›› 2020, Vol. 33 ›› Issue (2): 54-59.doi: 10.16180/j.cnki.issn1007-7820.2020.02.010

• • 上一篇    下一篇

一种改进相似度的协同过滤算法实现

许凤翔   

  1. 北方工业大学 计算机学院,北京 100144
  • 收稿日期:2019-01-03 出版日期:2020-02-15 发布日期:2020-03-12
  • 作者简介:许凤翔(1994-),男,硕士研究生。研究方向:数据处理技术与软件服务。
  • 基金资助:
    北京市自然基金委-市教委联合重点项目(KZ201810009011)

Implementation of a Collaborative Filtering Algorithm Based on Improved Similarity

XU Fengxiang   

  1. School of Computer,North China University of Technology,Beijing 100144,China
  • Received:2019-01-03 Online:2020-02-15 Published:2020-03-12
  • Supported by:
    Beijing Municipal Natural Science Foundation-City Education Commission Joint Key Project(KZ201810009011)

摘要:

计算相似度时,协同过滤算法会赋予所有用户或物品一致的相似度权重,进而导致相似度计算出现偏差。针对这一问题,文中提出一种改进相似度的协同过滤算法。该算法首先在计算用户间相似度时根据用户活跃量增加活跃用户惩罚因子,然后在计算物品间相似度时根据物品流行度增加热门物品惩罚因子,再对相似度做最大值归一化,最后根据相似度矩阵进行电影评分预测。实验结果表明,改进的相似度算法在评分预测时更加准确,平均绝对误差稳定在0.72左右。

关键词: 协同过滤, 皮尔逊系数, 相似度算法, 归一化, 平均绝对误差, 评分预测

Abstract:

When calculating the similarity, the collaborative filtering algorithm assigns similar weights to all users or items, which will lead to deviations in the similarity calculation. Aiming at this problem, an improved similarity algorithm was proposed to fix the error. Firstly, when calculating the similarity between users, the active user influence factor was added by the number of active users, and when calculating the similarity between items. When calculating the similarity between items, the hot item influence factor was added according to the popularity of the item, then similarity was maximum normalized. Finally, the rating of movies was predicted by using similarity matrix. The experimental results showed that the improved similarity algorithm was more accurate in rating prediction, and the average absolute error was stable at around 0.72.

Key words: collaborative filtering, pearson similarity, similarity algorithm, normalization, mean absolute error, rating prediction

中图分类号: 

  • TP301.6