›› 2012, Vol. 25 ›› Issue (11): 118-.
• Articles • Previous Articles Next Articles
QI Peng,LI Yinfeng,SONG Yuwei
Online:
Published:
Abstract:
In this paper web scraping technologies are discussed.The advantages of Web data collection technology for high speed and accuracy conversion of unstructured data into structured data are pointed out.The principles of the web scraping at HTTP level are introduced with emphasis on the technical solutions to Python-based web scraping.Web scraping system consists of two modules:HTTP interaction module and data analysis module.
Key words: Web scraping;screen scraping;HTTP;Python;regex;XPath
CLC Number:
QI Peng,LI Yinfeng,SONG Yuwei. Research on Python-based Web Scraping Technology[J]., 2012, 25(11): 118-.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://journal.xidian.edu.cn/dzkj/EN/
https://journal.xidian.edu.cn/dzkj/EN/Y2012/V25/I11/118
[1]赫特兰.Python基础教程[M].2版.北京:人民邮电出版社,2010.
[2]丘恩.Python核心编程[M].2版.北京:人民邮电出版社,2008.
[3]鲁特兹.Python学习手册[M].北京:机械工业出版社,2009.
[4]桂小林,汪宁波,李文.基于XML的远程教育课件规范化的研究与实现[J].电子科技,2010,23(6):129-131.
[5]刘红梅.脚本语言在数据采集系统中的应用研究[J].电子科技,2009,22(11):72-75.
Cited