›› 2012, Vol. 25 ›› Issue (1): 105-.
• Articles • Previous Articles Next Articles
LI Jian
Online:
Published:
Abstract:
In order to remove the noisy information existing in web pages effectively,this paper proposes a method of web page purification based on the improved DOM tree and BP neural network.The establishment of a block tree by the DOM tree and web content using HTMLParser can split the whole content into several sub-block trees according to their relations,thus simplifying the processing of the whole block into the processing of sub blocks.Statistic data shows that the content of the sub block has evident numerical characteristics,so the sub block can be used as the learning source of BP.In this way,the purification of web pages is converted into establishing a purifying model through learning.Experimental results show that this method can achieve satisfactory results in the application to Chinese web pages with themes.
Key words: web page purification;DOM tree;content block;neural network
CLC Number:
LI Jian. Application Research of Web Page Purification Based on DOM and Neural Network[J]., 2012, 25(1): 105-.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://journal.xidian.edu.cn/dzkj/EN/
https://journal.xidian.edu.cn/dzkj/EN/Y2012/V25/I1/105
Cited