›› 2012, Vol. 25 ›› Issue (11): 118-.

• Articles • Previous Articles     Next Articles

Research on Python-based Web Scraping Technology

QI Peng,LI Yinfeng,SONG Yuwei   

  1. (School of Electronic Engineering,Xidian University,Xi'an 710126,China)
  • Online:2012-11-15 Published:2013-01-23

Abstract:

In this paper web scraping technologies are discussed.The advantages of Web data collection technology for high speed and accuracy conversion of unstructured data into structured data are pointed out.The principles of the web scraping at HTTP level are introduced with emphasis on the technical solutions to Python-based web scraping.Web scraping system consists of two modules:HTTP interaction module and data analysis module.

Key words: Web scraping;screen scraping;HTTP;Python;regex;XPath

CLC Number: 

  • TP274+.2