电子科技 ›› 2020, Vol. 33 ›› Issue (7): 12-16.doi: 10.16180/j.cnki.issn1007-7820.2020.07.003

• • 上一篇    下一篇

基于LDA主题模型的情感分析研究

刘艳文,魏赟   

  1. 上海理工大学 光电信息与计算机工程学院,上海 200093
  • 收稿日期:2019-04-22 出版日期:2020-07-15 发布日期:2020-07-15
  • 作者简介:刘艳文(1992-),女,硕士研究生。研究方向:数据分析与挖掘。|魏赟(1976-),女,博士,副教授。研究方向:智能交通、对等网络、分布式系统。
  • 基金资助:
    国家自然科学基金(1170277);国家自然科学基金(61472256);上海市科委科研计划项目(16111107502)

Research of Emotional Analysis Based on LDA Topic Model

LIU Yanwen,WEI Yun   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 20009,China
  • Received:2019-04-22 Online:2020-07-15 Published:2020-07-15
  • Supported by:
    National Natural Science Foundation of China(1170277);National Natural Science Foundation of China(61472256);Shanghai Science and Technology Commission Scientific Research Project(16111107502)

摘要:

LDA主题模型在提取特征时缺乏对词语关联及相关词对的理解,这会影响情感极性分类的准确率。针对这一问题,文中提出一种在LDA主题模型中引入特征情感词对抽取方法的新模型,以改善特征情感词对的抽取效果。利用依存句法分析设计特征情感词对的识别方法,随后将识别方法作为约束条件引入LDA模型对特征情感词对进行抽取。通过吉布斯采样进行参数计算,给出了模型的生成过程。最后利用随机森林分类方法对文本进行情感极性分类。为验证文中模型的有效性,将其和另外两种模型一起进行实验,当主题个数为20时,文中所提模型分类的准确率、召回率、F值分别为81.54%、83.13%和82.33%,显著高于另外两种模型。

关键词: 产品评论, 情感分析, 依存句法, 特征抽取, LDA主题模型, 随机森林算法

Abstract:

LDA topic model lacks understanding of word association and related word pairs when extracting features, which affects the precision of emotional polarity classification. Aiming at this problem, this paper proposed a new model to introduce the feature-opinion pair extraction method in the LDA topic model to improve the extraction effect of the feature opinion pairs. Dependency parsing was used to design feature affective word pairs recognition methods of characteristic affective word pairs. Then the recognition method was introduced as a constraint condition into the LDA model to extract the feature sentiment word pairs. The parameters were calculated by Gibbs sampling, and the generation process of the model was proposed. Finally, the emotional polarity of the text was classified using the random forest classification method. In order to verify the validity of the proposed model, the experiment was carried out together with the other two models. When the number of subject was 20,the results showed that the precision, recall and F-Measure were 81.54%、83.13% and 82.33%, which were significantly higher than the other two models.

Key words: product reviews, sentiment analysis, dependency syntax, feature extraction, LDA topic model, random forest algorithm

中图分类号: 

  • TP391.1