J4 ›› 2009, Vol. 36 ›› Issue (4): 614-638.

• 研究论文 • 上一篇    下一篇

一种压缩稀疏用户评分矩阵的协同过滤算法

侯翠琴;焦李成;张文革   

  1. (西安电子科技大学 智能信息处理研究所; 智能感知与图像理解教育部重点实验室,陕西 西安  710071)
  • 收稿日期:2008-06-10 出版日期:2009-08-20 发布日期:2009-09-28
  • 通讯作者: 侯翠琴
  • 基金资助:

    国家自然科学基金资助(60703107,60703108,60703109,60702062);国家863项目资助(2006AA01Z107,2007AA12Z136,2007AA12Z223);973项目资助(2006CB705700);教育部长江学者和创新团队支持计划资助(IRT0645)

Collaborative filtering algorithm via compressing the sparse user-rating-data matrix

HOU Cui-qin;JIAO Li-cheng;ZHANG Wen-ge   

  1. (Ministry of Education Key Lab. of Intelligent Perception and Image Understanding, Research Inst. of Intelligent Information Processing, Xidian Univ., Xi'an  710071, China)
  • Received:2008-06-10 Online:2009-08-20 Published:2009-09-28
  • Contact: HOU Cui-qin

摘要:

提出了一种通过降低用户评分矩阵维数来解决数据稀疏问题的协同过滤算法(基于项目多类属概率潜在语义的协同过滤算法).首先将概率潜在语义分析法中的隐变量集固定为项目的多类属集,明确隐变量的意义,限制隐变量的变化范围; 而后迭代学习隐变量的分布,即用户的兴趣模型,压缩用户评分矩阵; 最后用学到的兴趣模型度量用户的相似度,对目标用户做出推荐. 仿真实验结果表明: 该算法有效解决了数据稀疏问题,平均绝对误差低于基于记忆的协同过滤算法4%; 与通过概率潜在语义分析法降低用户评分矩阵维数来解决数据稀疏问题的协同过滤算法相比,该算法明确了隐变量的意义,提高了对系统的理解,并取得了富有竞争力的推荐性能.

关键词: 项目多类属, 概率潜在语义分析, 迭代方法, 协同过滤, 算法

Abstract:

The paper proposes a novel memory-based collaborative filtering algorithm—Multi-label Probabilistic Latent Semantic Analysis based Collaborative Filtering, which improves the quality of recommendations by reducing the dimension of the user-rating-data matrix by multi-label probabilistic latent semantic analysis when the matrix is extremely sparse. Firstly, it confines the set of latent variables of probability latent semantic analysis to the set of multi-label of items to make latent variables have meanings of corresponding labels. Then it learns the probabilistic distribution of latent variables, i. e.,  the model of use's interest, to compress the user-rating-data matrix. Finally, it computes the similarity between different users based on the above learned model and makes recommendations. Compared to memory-based collaborative filtering algorithms, the proposed algorithm decreases the mean absolute error 4 percents averagely on test dataset by reducing the dimension of the user-rating-data matrix. The proposed algorithm makes the recommendation system understandable and obtains competitive recommendations compared to the filtering algorithm which reduces the dimension of the user-rating-data matrix by probabilistic latent semantic analysis.

Key words: multi-label of items, probabilistic latent semantic analysis, iterative method, collaborative filtering, algorithms

中图分类号: 

  • TP181