-
zhugw authored
在使用过程中发现urls.txt文件存在重复URL的情况,经跟踪源代码,发现初始化加载文件后,读取所有的url放入一集合中,但是之后添加待抓取URL时并未判断是否已存在该集合中(即文件中)了,故导致文件中重复URL的情况.故据此对源码做了修改,还请作者审阅.
1db940a0
在使用过程中发现urls.txt文件存在重复URL的情况,经跟踪源代码,发现初始化加载文件后,读取所有的url放入一集合中,但是之后添加待抓取URL时并未判断是否已存在该集合中(即文件中)了,故导致文件中重复URL的情况.故据此对源码做了修改,还请作者审阅.
Name |
Last commit
|
Last update |
---|---|---|
assets | Loading commit data... | |
en_docs | Loading commit data... | |
webmagic-avalon | Loading commit data... | |
webmagic-core | Loading commit data... | |
webmagic-extension | Loading commit data... | |
webmagic-samples | Loading commit data... | |
webmagic-saxon | Loading commit data... | |
webmagic-scripts | Loading commit data... | |
webmagic-selenium | Loading commit data... | |
zh_docs | Loading commit data... | |
.gitignore | Loading commit data... | |
.travis.yml | Loading commit data... | |
README.md | Loading commit data... | |
pom.xml | Loading commit data... | |
release-note.md | Loading commit data... | |
user-manual.md | Loading commit data... | |
webmagic-avalon.md | Loading commit data... |