西安电子科技大学学报 ›› 2021, Vol. 48 ›› Issue (6): 179-186.doi: 10.19665/j.issn1001-2400.2021.06.022

• 计算机科学与技术 • 上一篇    下一篇

一种基于字词双通道网络的文本情感分析方法

李源1(),崔玉爽2(),王伟1()   

  1. 1.安阳工学院 计算机科学与信息工程学院,河南 安阳 455000
    2.信阳师范学院 计算机与信息技术学院,河南 信阳 464000
  • 收稿日期:2020-07-06 出版日期:2021-12-20 发布日期:2022-02-24
  • 作者简介:李 源(1981—),男,讲师,硕士,E-mail: lyazjx@126.com|崔玉爽(1991—),女,讲师,硕士,E-mail: cuiyushuang123@163.com|王 伟(1987—),男,讲师,硕士,E-mail: ayit_ww@163.com
  • 基金资助:
    国家自然科学基金(31872704);河南省重点研发与推广专项(182102210197)

Method for the analysis of text sentiment based on the word dual-channel network

LI Yuan1(),CUI Yushuang2(),WANG Wei1()   

  1. 1. School of Computer Science and Information Engineering,Anyang Institute of Technology,Anyang 455000,China
    2. School of Computer and Information Technology,Xinyang Normal University,Xinyang 464000,China
  • Received:2020-07-06 Online:2021-12-20 Published:2022-02-24

摘要:

针对传统情感分析方法存在的分类准确率低,提取信息不全面等问题,提出了一种基于字词双通道情感分析方法C-A-BiLSTM。该方法模型通过在字向量和词向量两个不同方向的通道上利用卷积神经网络进行卷积运算。其中,字向量通道提取了语义更加丰富的局部信息并且有效缓解了词表中未登录词的问题,而词向量通道利用词性标注技术获取对应单词的词性,解决了原始词向量面临的一词多义问题。这两个通道的结合虽高效挖掘出更深层的语义语法信息,但是无法从文本张量中筛选出关键信息,耗费了大量的算力,因此引入了Attention机制,使模型有目标性的关注重要信息并降低了计算的复杂度。文中在此基础上,通过结合双向长短记忆网络来进一步提取上下文信息,从而获得更加全面且准确的高质量文本情感特征信息。通过对比实验,结果显示,相比于传统的卷积神经网络、支持向量机以及双向长短记忆网络算法,该方法在准确率、召回率和F1值等指标均达到94%以上,而且其差错率也降低了约1%~6%,证明该方法在文本分类任务中具有较优的分析效果。

关键词: 卷积神经网络, 双向长短记忆网络, 文本情感分析, 字向量, Word-POS向量

Abstract:

A new two-channel sentiment analysis method,C-A-BiLSTM,is proposed to solve the problems that the traditional sentiment analysis method has a low accuracy and cannot fully extract text feature information.The model performs convolution operations on two different channels in different directions of word vectors and Word-POS word vectors to mine deeper semantic information,in which the word vector channel extracts more semantic local information and effectively alleviates the problem of unlisted words in the thesaurus.The word vector channel uses the part of speech tagging technology to obtain the part of speech of the corresponding word,which solves the problem of polysemy of one word faced by the original word vector.The combination of the two channels can efficiently mine deeper semantic and grammatical information,but it is unable to filter the key information from the text tensor,which consumes a lot of computational power.Therefore,the attention mechanism is introduced,on the basis of which the A-BiLSTM network combined with the Attention mechanism is used to further extract context information and to gain more comprehensive and high-quality features.Experimental achievements indicate that the accuracy,recall and F1 values which the model has reached all exceed 94%,which is notably enhanced in comparison with the CNN algorithm,SVM and BiLSTM algorithm,and that the error rate is reduced by about 1%~6%.The method has a certain advantage in text analysis tasks.

Key words: CNN, BiLSTM, text sentiment analysis, word vector, Word-POS vector

中图分类号: 

  • TP391