›› 2012, Vol. 25 ›› Issue (1): 105-.

• Articles • Previous Articles     Next Articles

Application Research of Web Page Purification Based on DOM and Neural Network

 LI Jian   

  1. (Battle Laboratory,Nanchang Army College,Nanchang 330103,China)
  • Online:2012-01-15 Published:2012-01-10

Abstract:

In order to remove the noisy information existing in web pages effectively,this paper proposes a method of web page purification based on the improved DOM tree and BP neural network.The establishment of a block tree by the DOM tree and web content using HTMLParser can split the whole content into several sub-block trees according to their relations,thus simplifying the processing of the whole block into the processing of sub blocks.Statistic data shows that the content of the sub block has evident numerical characteristics,so the sub block can be used as the learning source of BP.In this way,the purification of web pages is converted into establishing a purifying model through learning.Experimental results show that this method can achieve satisfactory results in the application to Chinese web pages with  themes.

Key words: web page purification;DOM tree;content block;neural network

CLC Number: 

  • TP393.07