Electronic Science and Technology ›› 2020, Vol. 33 ›› Issue (10): 51-56.doi: 10.16180/j.cnki.issn1007-7820.2020.10.009

Previous Articles     Next Articles

Chinese Short Text Similarity Calculation Based on TextRank Algorithm

LU Jiawei,CHEN Wei,YIN Zhong   

  1. School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China
  • Received:2019-07-21 Online:2020-10-15 Published:2020-10-20
  • Supported by:
    National Natural Science Foundation of China(61703277)

Abstract:

The traditional VSM vector space model often ignores text semantics, and the constructes text feature matrix is sparse. Based on the word vector technology of deep learning, this paper proposes a similarity calculation method that integrates the improved TextRank algorithm. This method uses the word vector embedding technology to build a text vector space, which makes the vector space model possess the semantic relevance. At the same time, with the improved TextRank algorithm to extract text keywords, the expression of text feature is enhanced and a large amount of redundant information is eliminated. The text characteristic of sparse matrix is reduced, which makes the text similarity computing more efficient. The results of the simulation experiments of different models show that the fusion of the improved TextRank algorithm with Bert word vector technology have better performance of text similarity calculation.

Key words: text similarity, extraction, TextRank slgorithm, Bert, word vector technique, vector space model

CLC Number: 

  • TP391