Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
69ff524d
Commit
69ff524d
authored
Jun 19, 2013
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update api in READEME
parent
986ae0be
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
4 additions
and
4 deletions
+4
-4
README.md
README.md
+4
-4
No files found.
README.md
View file @
69ff524d
...
...
@@ -18,7 +18,7 @@ webmagic正处于开发阶段,目前还没有稳定版本。欢迎开发者参
*
####垂直爬虫####
webmagic着重于页面抽取的工作。开发者可以使用xpath和正则表达式进行链接和内容的提取,支持链式API调用,以及单复数转换。
String content = page.getHtml().x
("//div[@class='body']").r
("这段话比较重要(.*)").toString();
String content = page.getHtml().x
path("//div[@class='body']").regex
("这段话比较重要(.*)").toString();
*
####嵌入式&无配置####
webmagic与其他Full-Stack的框架不同,没有配置文件,大部分功能都通过简单的API调用完成。webmagic以jar包的形式存在,并且不依赖任何框架,在程序可以随处进行调用。
...
...
@@ -57,13 +57,13 @@ webmagic定制的核心是PageProcessor接口。
@Override
public void process(Page page) {
List<String> requests = page.getHtml().
as().rs
(urlPattern).toStrings();
List<String> requests = page.getHtml().
links().regex
(urlPattern).toStrings();
//调用page.addTargetRequests()方法添加待抓取链接
page.addTargetRequests(requests);
//xpath方式抽取
page.putField("title", page.getHtml().x("//title"));
page.putField("title", page.getHtml().x
path
("//title"));
//sc表示使用Readability技术抽取正文
page.putField("content", page.getHtml().s
c
());
page.putField("content", page.getHtml().s
martContent
());
}
@Override
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment