Journal of Jianghan University(Natural Science Edition) ›› 2018, Vol. 46 ›› Issue (6): 522-527.doi: 10.16389/j.cnki.cn42-1737/n.2018.06.006

Previous Articles     Next Articles

Collection Method of Network Information for Traditional Chinese Medicinal Materials Based on Scrapy

ZHANG Xihong,WANG Yuxiang   

  1. Bozhou Vocational and Technical College,Bozhou 236800,Anhui,China
  • Online:2018-12-28 Published:2018-11-29

Abstract: At present,the data related to Chinese herbal medicines on the internet is increasing by tens of thousands. It is of great significance to excavate the potential relationship behind these data, establish commodity specifications and price warning mechanism to guide the smooth and orderly running of the market. Taking the information collection of Tiandi website of Chinese herbal medicine as an example,a spider based on Scrapy was designed to extract the information of Chinese herbal medicine name,specifications,origin,price and so on. Firstly,the structure of the target page was analyzed and the XPath path of the target element was extracted with the help of the web page elements reviewing tool of the browser. Then,the web spider project was constructed with the Scrapy framework,and the parsing rules of the target elements and the storage methods of the elements were designed in the corresponding files. Finally,the spider was used to collect the information of the target website for testing. Taking Panax quinquefolium and Panax notoginseng for example,the data collected on-line were compared with the data collected off-line and on the spot. The results show that the designed spider can obtain the information of the target website quickly, efficiently and accurately,it is consistent with the off-line field survey data,also it can provide data support for subsequent study.

Key words: Scrapy, traditional Chinese medicinal materials, spider

CLC Number: