西安电子科技大学学报 ›› 2019, Vol. 46 ›› Issue (2): 83-88.doi: 10.19665/j.issn1001-2400.2019.02.014

• • 上一篇    下一篇

异构社交网络用户兴趣挖掘方法

屠守中1,闫洲2,卫玲蔚2,朱小燕1   

  1. 1. 清华大学 计算机科学与技术系,北京 100084
    2. 中国科学院信息工程研究所,北京 100093
  • 收稿日期:2018-08-21 出版日期:2019-04-20 发布日期:2019-04-20
  • 作者简介:屠守中(1983- ),男,清华大学博士研究生,E-mail:tusz11@mails.tsinghua.edu.cn.
  • 基金资助:
    国家自然科学基金(61332007)

User interesting mining method in the heterogeneous social network

TU Shouzhong1,YAN Zhou2,WEI Lingwei2,ZHU Xiaoyan1   

  1. 1. School of Computer Science and Technology, Tsinghua Univ., Beijing 100084, China
    2. Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093, China
  • Received:2018-08-21 Online:2019-04-20 Published:2019-04-20

摘要:

由于当前各类主流网络平台的发展呈现出“社交平台内容化、内容平台社交化”的趋势,用户分化也日趋明显,出现了拥有大规模粉丝的超级节点,因此,提出了一种基于社交关系的兴趣挖掘模型。结合矩阵分解和标签传播算法,将用户分为内容发布者和普通用户两类,并分别提取和计算兴趣话题,实现了在大规模异构网络中发现、挖掘用户兴趣。基于知乎数据集上设计的对比实验,验证了模型的有效性以及算法的性能优势。与基线方法相比,这种算法在查全率上最大提升约42%,F1值最大提升约33%。

关键词: 异构网络, 社交网络, 兴趣模型, 非负矩阵分解, 标签传播

Abstract:

Due to great advances in the mobile Internet, the Social Network Service (SNS) has become an indispensable service. The development of current mainstream social media shows a trend that social service and information service are combined and interwork to provide a better experience. Meanwhile there is increasing polarization among users. The heterogeneous features, say, the combination of information content and sociality as well as the polarization of user roles, present challenges to traditional research in social media. Some studies of social media are mainly based on the equal position among nodes or similar relations. If the algorithms brought about by these studies are applied directly to the networks where the users are highly polarized, the results may be distorted or even be quite different from the fact. A new model for interests mining based on social relations is proposed in this paper. Dealing with the polarization in social media, we incorporate matrix factorization and the label propagation algorithm to treat information disseminators and average users, respectively, in order to discover interests of average users in a large-scale heterogeneous network. The validness of the model and the performance and advantages of the algorithm are tested and verified in Zhihu datasets. Experiments show that the maximum increase in the recall of the proposed method, compared with the baseline, is 42%.

Key words: heterogeneous network, social networks, interest model, non-negative matrix factorization, label Propagation

中图分类号: 

  • TP391.1