Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
c79d6ecf
Commit
c79d6ecf
authored
Aug 17, 2013
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
complete all comments
parent
90bbe9b9
Changes
11
Expand all
Hide whitespace changes
Inline
Side-by-side
Showing
11 changed files
with
9 additions
and
611 deletions
+9
-611
ReplaceSelector.java
.../java/us/codecraft/webmagic/selector/ReplaceSelector.java
+1
-1
XpathSelector.java
...in/java/us/codecraft/webmagic/selector/XpathSelector.java
+1
-1
UrlUtilsTest.java
...c/test/java/us/codecraft/webmagic/utils/UrlUtilsTest.java
+0
-601
MultiPageModel.java
...n/src/main/java/us/codecraft/webmagic/MultiPageModel.java
+1
-1
AfterExtractor.java
...main/java/us/codecraft/webmagic/model/AfterExtractor.java
+1
-1
OOSpider.java
...n/src/main/java/us/codecraft/webmagic/model/OOSpider.java
+1
-1
ExtractBy.java
...ava/us/codecraft/webmagic/model/annotation/ExtractBy.java
+1
-1
JsonFilePageModelPipeline.java
...odecraft/webmagic/pipeline/JsonFilePageModelPipeline.java
+1
-1
JsonFilePipeline.java
...java/us/codecraft/webmagic/pipeline/JsonFilePipeline.java
+1
-1
RedisScheduler.java
.../java/us/codecraft/webmagic/scheduler/RedisScheduler.java
+1
-1
DoubleKeyMap.java
...c/main/java/us/codecraft/webmagic/utils/DoubleKeyMap.java
+0
-1
No files found.
webmagic-core/src/main/java/us/codecraft/webmagic/selector/ReplaceSelector.java
View file @
c79d6ecf
...
...
@@ -6,7 +6,7 @@ import java.util.regex.Pattern;
import
java.util.regex.PatternSyntaxException
;
/**
* Replace selector
。
<br>
* Replace selector
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.1.0
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/XpathSelector.java
View file @
c79d6ecf
...
...
@@ -6,7 +6,7 @@ import java.util.ArrayList;
import
java.util.List
;
/**
* XPath selector based on HtmlCleaner
。
<br>
* XPath selector based on HtmlCleaner
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.1.0
...
...
webmagic-core/src/test/java/us/codecraft/webmagic/utils/UrlUtilsTest.java
View file @
c79d6ecf
This diff is collapsed.
Click to expand it.
webmagic-extension/src/main/java/us/codecraft/webmagic/MultiPageModel.java
View file @
c79d6ecf
...
...
@@ -5,7 +5,7 @@ import us.codecraft.webmagic.utils.Experimental;
import
java.util.Collection
;
/**
* Extract an object of more than one pages, such as news and articles
。
<br>
* Extract an object of more than one pages, such as news and articles
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.2.0
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/model/AfterExtractor.java
View file @
c79d6ecf
...
...
@@ -3,7 +3,7 @@ package us.codecraft.webmagic.model;
import
us.codecraft.webmagic.Page
;
/**
* Interface to be implemented by page models that need to do something after fields are extracted
。
<br>
* Interface to be implemented by page models that need to do something after fields are extracted
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.2.0
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/model/OOSpider.java
View file @
c79d6ecf
...
...
@@ -5,7 +5,7 @@ import us.codecraft.webmagic.Spider;
import
us.codecraft.webmagic.processor.PageProcessor
;
/**
* The spider for page model extractor
。
<br>
* The spider for page model extractor
.
<br>
* In webmagic, we call a POJO containing extract result as "page model". <br>
* You can customize a crawler by write a page model with annotations. <br>
* Such as:
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/model/annotation/ExtractBy.java
View file @
c79d6ecf
...
...
@@ -5,7 +5,7 @@ import java.lang.annotation.Retention;
import
java.lang.annotation.Target
;
/**
* Define the extractor for field or class
。
<br>
* Define the extractor for field or class
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.2.0
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/pipeline/JsonFilePageModelPipeline.java
View file @
c79d6ecf
...
...
@@ -14,7 +14,7 @@ import java.io.IOException;
import
java.io.PrintWriter
;
/**
* Store results objects (page models) to files in JSON format
。
<br>
* Store results objects (page models) to files in JSON format
.
<br>
* Use model.getKey() as file name if the model implements HasKey.<br>
* Otherwise use SHA1 as file name.
*
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/pipeline/JsonFilePipeline.java
View file @
c79d6ecf
...
...
@@ -13,7 +13,7 @@ import java.io.IOException;
import
java.io.PrintWriter
;
/**
* Store results to files in JSON format
。
<br>
* Store results to files in JSON format
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.2.0
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/scheduler/RedisScheduler.java
View file @
c79d6ecf
...
...
@@ -9,7 +9,7 @@ import us.codecraft.webmagic.Request;
import
us.codecraft.webmagic.Task
;
/**
* Use Redis as url scheduler for distributed crawlers
。
<br>
* Use Redis as url scheduler for distributed crawlers
.
<br>
*
* @author code4crafter@gmail.com <br>
* @since 0.2.0
...
...
webmagic-extension/src/main/java/us/codecraft/webmagic/utils/DoubleKeyMap.java
View file @
c79d6ecf
...
...
@@ -92,7 +92,6 @@ public class DoubleKeyMap<K1, K2, V> extends MultiKeyMapBase {
return
null
;
}
V
remove
=
get
(
key1
).
remove
(
key2
);
// 如果上一级map为空,把它也回收掉
if
(
get
(
key1
).
size
()
==
0
)
{
remove
(
key1
);
}
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment