Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
5cb45af3
Commit
5cb45af3
authored
Aug 17, 2013
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
+doc
parent
fd9ae6d9
Changes
4
Hide whitespace changes
Inline
Side-by-side
Showing
4 changed files
with
38 additions
and
2 deletions
+38
-2
Page.java
webmagic-core/src/main/java/us/codecraft/webmagic/Page.java
+13
-2
Request.java
...gic-core/src/main/java/us/codecraft/webmagic/Request.java
+2
-0
package.html
...gic-core/src/main/java/us/codecraft/webmagic/package.html
+5
-0
ComboExtract.java
.../us/codecraft/webmagic/model/annotation/ComboExtract.java
+18
-0
No files found.
webmagic-core/src/main/java/us/codecraft/webmagic/Page.java
View file @
5cb45af3
...
@@ -8,7 +8,7 @@ import java.util.ArrayList;
...
@@ -8,7 +8,7 @@ import java.util.ArrayList;
import
java.util.List
;
import
java.util.List
;
/**
/**
* <pre>
* <pre
class="zh"
>
* Page保存了上一次抓取的结果,并可定义待抓取的链接内容。
* Page保存了上一次抓取的结果,并可定义待抓取的链接内容。
*
*
* 主要方法:
* 主要方法:
...
@@ -19,6 +19,17 @@ import java.util.List;
...
@@ -19,6 +19,17 @@ import java.util.List;
* {@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} 添加待抓取的链接
* {@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} 添加待抓取的链接
*
*
* </pre>
* </pre>
* <pre class="en">
* Store extracted result and urls to be crawled.
*
* Main method:
* {@link #getUrl()} get url of current page
* {@link #getHtml()} get content of current page
* {@link #putField(String, Object)} save extracted result
* {@link #getResultItems()} get extract results to be used in {@link us.codecraft.webmagic.pipeline.Pipeline}
* {@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} add urls to crawl
*
* </pre>
*
*
* @author code4crafter@gmail.com <br>
* @author code4crafter@gmail.com <br>
*/
*/
...
@@ -44,7 +55,7 @@ public class Page {
...
@@ -44,7 +55,7 @@ public class Page {
}
}
/**
/**
*
保存抽取的结果
*
*
*
* @param key 结果的key
* @param key 结果的key
* @param field 结果的value
* @param field 结果的value
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/Request.java
View file @
5cb45af3
...
@@ -5,6 +5,7 @@ import java.util.HashMap;
...
@@ -5,6 +5,7 @@ import java.util.HashMap;
import
java.util.Map
;
import
java.util.Map
;
/**
/**
* <div class="zh">
* Request对象封装了待抓取的url信息。<br/>
* Request对象封装了待抓取的url信息。<br/>
* 在PageProcessor中,Request对象可以通过{@link us.codecraft.webmagic.Page#getRequest()} 获取。<br/>
* 在PageProcessor中,Request对象可以通过{@link us.codecraft.webmagic.Page#getRequest()} 获取。<br/>
* <br/>
* <br/>
...
@@ -22,6 +23,7 @@ import java.util.Map;
...
@@ -22,6 +23,7 @@ import java.util.Map;
* String linktext = (String)page.getRequest().getExtra()[0];
* String linktext = (String)page.getRequest().getExtra()[0];
* }
* }
* </pre>
* </pre>
* </div>
*
*
* @author code4crafter@gmail.com <br>
* @author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/package.html
View file @
5cb45af3
<html>
<html>
<body>
<body>
<div
class=
"en"
>
Main class "Spider" and models.
</div>
<div
class=
"zh"
>
包括webmagic入口类Spider和一些数据传递的实体类。
包括webmagic入口类Spider和一些数据传递的实体类。
</div>
</body>
</body>
</html>
</html>
webmagic-extension/src/main/java/us/codecraft/webmagic/model/annotation/ComboExtract.java
0 → 100644
View file @
5cb45af3
package
us
.
codecraft
.
webmagic
.
model
.
annotation
;
import
java.lang.annotation.ElementType
;
import
java.lang.annotation.Retention
;
import
java.lang.annotation.Target
;
/**
* @author code4crafter@gmail.com <br>
* Date: 13-8-16 <br>
* Time: 下午11:09 <br>
*/
@Retention
(
java
.
lang
.
annotation
.
RetentionPolicy
.
RUNTIME
)
@Target
({
ElementType
.
FIELD
,
ElementType
.
TYPE
})
public
@interface
ComboExtract
{
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment