J4 ›› 2013, Vol. 40 ›› Issue (2): 89-97+129.doi: 10.3969/j.issn.1001-2400.2013.02.015

• Original Articles • Previous Articles     Next Articles

Chinese text semantic representation for text classification

SONG Shengli;WANG Shaolong;CHEN Ping   

  1. (Research Inst. of Software Engineering, Xidian Univ., Xi'an  710071, China)
  • Received:2011-11-11 Online:2013-04-20 Published:2013-05-22
  • Contact: SONG Shengli E-mail:shlsong@xidian.edu.cn

Abstract:

Text representation based on word frequency statistics is often unsatisfactory because it ignores the semantic relationships between words, and considers them as independent features. In this paper, a new Chinese text semantic representation model is proposed by considering contextual semantic and background information on the words in the text. The method captures the semantic relationships between words using Wikipedia as a knowledge base. Words with strong semantic relationships are combined into a word-package as indicated by a graph node, which is weighted with the sum of the number and frequency of the words it contains. The contextual relationship between words in different word-packages is stated by a directed edge, which is weighted with the maximum weight of its adjacent nodes. The model retains the contextual information on each word with a large extent. Meanwhile, the semantic meaning between words is strengthened. Experimental results of Chinese text classification show that the proposed model can express the content of a text accurately and improve the performance of text classification. Compared to Support Vector Machines, Text Semantic Graph-based Classification can improve the efficiency by 7.8%, reduce the error rate by 1/3, and show more stability.

Key words: classification, knowledge representation, similarity, text semantic graph

CLC Number: 

  • TP181