基于Python网络爬虫技术的数据采集系统研究_钟机灵.pdf

下载文档

7
0
约1.55万字
约 3页
2023-01-18 发布于北京
举报
版权申诉
保障服务

基于Python网络爬虫技术的数据采集系统研究_钟机灵.pdf

1、本文档共3页，可阅读全部内容。
2、原创力文档（book118）网站文档一经付费（服务费），不意味着购买了该文档的版权，仅供个人/单位学习、研究之用，不得用于商业用途，未经授权，严禁复制、发行、汇编、翻译或者网络传播等，侵权必究。
3、本站所有内容均由合作方或网友上传，本站不对文档的完整性、权威性及其观点立场正确性做任何保证或承诺！文档内容仅供研究参考，付费前请自行鉴别。如您付费，意味着您自己接受本站规则且自行承担风险，本站不退款、不进行额外附加服务；查看《如何避免下载的几个坑》。如果您已付费下载过本站文档，您可以点击这里二次下载。
4、如文档侵犯商业秘密、侵犯著作权、侵犯人身权等，请点击“版权申诉”（推荐），也可以打举报电话：400-050-0827(电话支持时间：9:00-18:30)。

2020年第 04期信息通信 2020 （总第 208期） INFORMATION COMMUNICATIONS （Sum.No208）基于Python 网络爬虫技术的数据采集系统研究钟机灵（河源职业技术学院，广东河源517000）摘要：从互联网里采集数据是解决数据来源问题的关键，研究开发基于Python 网络爬虫技术的数据采集系统实现了主题数据的自动采集。利用urllib 、Beautiful Soup 、threading 库设计开发了包含数据爬取、异常处理、robots 协议管理及多线程管理等模块的系统模型框架。通过具体案例应用，介绍了数据采集过程，相比传统手工数据采集，较大提高了工作效率。关键词：网络爬虫；数据采集；Python 技术中图分类号：TP393 文献标识码：A 文章编号：1673- 113 1（2020）04-0096-03 Research on the python-based web crawler for data collection system Zhong Jiling (Heyuan Polytechnic,Guangdong Heyuan 517000) Abstract :Collecting data from the Internet is the key to solve the problem of data source, The research of data collection system based on python web crawler, which is realizes automatic collection subj ect data. The system model framework including data crawling, exception handling, robots protocol management and multithreading management is designed and which is using ur- llib, beautiful soup and threading libraries. The process of data collection is introduced through the application of specific cases. Compared with the traditional manual data collection ,there is greatly improved work efficiency. Keyword: web crawler ; data collection ; python technology 0 引言后端数据库中。本文利用Pytho 爬虫相关技术对主题网络爬虫的相关技术进行了研究。随着大数据技术的发展，从互联网里采集数据是大众获取数据的重要渠道，网络爬虫技术通过编写程序自动爬取互 1 Python 爬虫相关技术介绍联网网页内容，实现数据的自动采集，已经广泛应用于搜索引