Journal of Jianghan University(Natural Science Edition) ›› 2018, Vol. 46 ›› Issue (6): 522-527.doi: 10.16389/j.cnki.cn42-1737/n.2018.06.006
Previous Articles Next Articles
ZHANG Xihong,WANG Yuxiang
Online:
Published:
Abstract: At present,the data related to Chinese herbal medicines on the internet is increasing by tens of thousands. It is of great significance to excavate the potential relationship behind these data, establish commodity specifications and price warning mechanism to guide the smooth and orderly running of the market. Taking the information collection of Tiandi website of Chinese herbal medicine as an example,a spider based on Scrapy was designed to extract the information of Chinese herbal medicine name,specifications,origin,price and so on. Firstly,the structure of the target page was analyzed and the XPath path of the target element was extracted with the help of the web page elements reviewing tool of the browser. Then,the web spider project was constructed with the Scrapy framework,and the parsing rules of the target elements and the storage methods of the elements were designed in the corresponding files. Finally,the spider was used to collect the information of the target website for testing. Taking Panax quinquefolium and Panax notoginseng for example,the data collected on-line were compared with the data collected off-line and on the spot. The results show that the designed spider can obtain the information of the target website quickly, efficiently and accurately,it is consistent with the off-line field survey data,also it can provide data support for subsequent study.
Key words: Scrapy, traditional Chinese medicinal materials, spider
CLC Number:
TP391.3
ZHANG Xihong,WANG Yuxiang. Collection Method of Network Information for Traditional Chinese Medicinal Materials Based on Scrapy[J]. Journal of Jianghan University(Natural Science Edition), 2018, 46(6): 522-527.
0 / / Recommend
Add to citation manager EndNote|Reference Manager|ProCite|BibTeX|RefWorks
URL: https://qks.jhun.edu.cn/jhdx_zk/EN/10.16389/j.cnki.cn42-1737/n.2018.06.006
https://qks.jhun.edu.cn/jhdx_zk/EN/Y2018/V46/I6/522