Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
78cfb4d5
Commit
78cfb4d5
authored
Jul 30, 2013
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
dep
parent
aa9bee7b
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
2 additions
and
1 deletion
+2
-1
README.md
README.md
+2
-1
No files found.
README.md
View file @
78cfb4d5
...
...
@@ -33,6 +33,8 @@ webmagic的功能覆盖整个爬虫的生命周期(链接提取、页面下载
###Get Started
webmagic定制的核心是PageProcessor接口。
项目使用maven托管,如果没用maven的可以去
[
http://git.oschina.net/flashsword20/webmagic-bin
](
http://git.oschina.net/flashsword20/webmagic-bin
)
库下载依赖包(这个仓库代码没有实时同步更新,不过依赖应该不会有变化)。
例如,我们要实现一个简单的通用爬虫SimplePageProcessor,代码如下:
...
...
@@ -73,7 +75,6 @@ webmagic定制的核心是PageProcessor接口。
Spider.create(new SimplePageProcessor("http://my.oschina.net/", "http://my.oschina.net/*/blog/*")).run();
### 示例
webmagic-samples目录里有一些定制PageProcessor以抽取不同站点的例子。
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment