Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
fb0797b6
Commit
fb0797b6
authored
Jun 18, 2013
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
update docs
parent
8f954c79
Changes
50
Hide whitespace changes
Inline
Side-by-side
Showing
50 changed files
with
64 additions
and
54 deletions
+64
-54
Page.java
webmagic-core/src/main/java/us/codecraft/webmagic/Page.java
+14
-4
Request.java
...gic-core/src/main/java/us/codecraft/webmagic/Request.java
+1
-1
Site.java
webmagic-core/src/main/java/us/codecraft/webmagic/Site.java
+1
-1
Spider.java
...agic-core/src/main/java/us/codecraft/webmagic/Spider.java
+1
-1
Task.java
webmagic-core/src/main/java/us/codecraft/webmagic/Task.java
+1
-1
Downloader.java
...ain/java/us/codecraft/webmagic/downloader/Downloader.java
+2
-2
HttpClientDownloader.java
...s/codecraft/webmagic/downloader/HttpClientDownloader.java
+1
-1
HttpClientPool.java
...java/us/codecraft/webmagic/downloader/HttpClientPool.java
+1
-1
ConsolePipeline.java
.../java/us/codecraft/webmagic/pipeline/ConsolePipeline.java
+1
-1
FilePipeline.java
...ain/java/us/codecraft/webmagic/pipeline/FilePipeline.java
+1
-1
Pipeline.java
...rc/main/java/us/codecraft/webmagic/pipeline/Pipeline.java
+1
-1
PageProcessor.java
...n/java/us/codecraft/webmagic/processor/PageProcessor.java
+1
-1
SimplePageProcessor.java
.../us/codecraft/webmagic/processor/SimplePageProcessor.java
+1
-1
FileCacheQueueSchedular.java
...codecraft/webmagic/schedular/FileCacheQueueSchedular.java
+1
-1
QueueSchedular.java
.../java/us/codecraft/webmagic/schedular/QueueSchedular.java
+1
-1
Schedular.java
.../main/java/us/codecraft/webmagic/schedular/Schedular.java
+1
-1
Html.java
...re/src/main/java/us/codecraft/webmagic/selector/Html.java
+1
-1
PlainText.java
...c/main/java/us/codecraft/webmagic/selector/PlainText.java
+1
-1
RegexResult.java
...main/java/us/codecraft/webmagic/selector/RegexResult.java
+1
-1
RegexSelector.java
...in/java/us/codecraft/webmagic/selector/RegexSelector.java
+1
-1
ReplaceSelector.java
.../java/us/codecraft/webmagic/selector/ReplaceSelector.java
+1
-1
Selectable.java
.../main/java/us/codecraft/webmagic/selector/Selectable.java
+1
-1
Selector.java
...rc/main/java/us/codecraft/webmagic/selector/Selector.java
+1
-1
SelectorFactory.java
.../java/us/codecraft/webmagic/selector/SelectorFactory.java
+1
-1
SmartContentSelector.java
.../us/codecraft/webmagic/selector/SmartContentSelector.java
+1
-1
XpathSelector.java
...in/java/us/codecraft/webmagic/selector/XpathSelector.java
+1
-1
UrlUtils.java
...e/src/main/java/us/codecraft/webmagic/utils/UrlUtils.java
+1
-1
HtmlTest.java
...ic-core/src/test/java/us/codecraft/webmagic/HtmlTest.java
+1
-1
RegexSelectorTest.java
...ava/us/codecraft/webmagic/selector/RegexSelectorTest.java
+1
-1
XpathSelectorTest.java
...ava/us/codecraft/webmagic/selector/XpathSelectorTest.java
+1
-1
UrlUtilsTest.java
...c/test/java/us/codecraft/webmagic/utils/UrlUtilsTest.java
+1
-1
FreemarkerPipeline.java
...va/us/codecraft/webmagic/pipeline/FreemarkerPipeline.java
+1
-1
FreemarkerPipelineTest.java
...st/java/us/codecraft/webmagic/FreemarkerPipelineTest.java
+1
-1
DiandianBlogProcessor.java
.../us/codecraft/webmagic/samples/DiandianBlogProcessor.java
+1
-1
DianpingProcessor.java
...java/us/codecraft/webmagic/samples/DianpingProcessor.java
+1
-1
DiaoyuwengProcessor.java
...va/us/codecraft/webmagic/samples/DiaoyuwengProcessor.java
+1
-1
F58PageProcesser.java
.../java/us/codecraft/webmagic/samples/F58PageProcesser.java
+1
-1
HuxiuProcessor.java
...in/java/us/codecraft/webmagic/samples/HuxiuProcessor.java
+1
-1
KaichibaProcessor.java
...java/us/codecraft/webmagic/samples/KaichibaProcessor.java
+1
-1
MeicanProcessor.java
...n/java/us/codecraft/webmagic/samples/MeicanProcessor.java
+1
-1
NjuBBSProcessor.java
...n/java/us/codecraft/webmagic/samples/NjuBBSProcessor.java
+1
-1
OschinaBlogPageProcesser.java
.../codecraft/webmagic/samples/OschinaBlogPageProcesser.java
+1
-1
OschinaPageProcesser.java
...a/us/codecraft/webmagic/samples/OschinaPageProcesser.java
+1
-1
QzoneBlogProcessor.java
...ava/us/codecraft/webmagic/samples/QzoneBlogProcessor.java
+1
-1
SinaBlogProcesser.java
...java/us/codecraft/webmagic/samples/SinaBlogProcesser.java
+1
-1
TianyaPageProcesser.java
...va/us/codecraft/webmagic/samples/TianyaPageProcesser.java
+1
-1
SpiderTest.java
...mples/src/test/java/us/codecraft/webmagic/SpiderTest.java
+1
-1
DiandianProcessorTest.java
...s/codecraft/webmagic/processor/DiandianProcessorTest.java
+1
-1
DiaoyuwengProcessorTest.java
...codecraft/webmagic/processor/DiaoyuwengProcessorTest.java
+1
-1
SinablogProcessorTest.java
...s/codecraft/webmagic/processor/SinablogProcessorTest.java
+1
-1
No files found.
webmagic-core/src/main/java/us/codecraft/webmagic/Page.java
View file @
fb0797b6
...
@@ -10,10 +10,16 @@ import java.util.Map;
...
@@ -10,10 +10,16 @@ import java.util.Map;
import
java.util.concurrent.ConcurrentHashMap
;
import
java.util.concurrent.ConcurrentHashMap
;
/**
/**
* Page保存了抓取的结果,并可定义下一次抓取的链接内容。
* <pre>
* Author: code4crafter@gmail.com
*Page保存了上一次抓取的结果,并可定义待抓取的链接内容。
* Date: 13-4-21
*
* Time: 上午11:22
* 主要方法:
* {@link #getUrl()} 获取页面的Url
* {@link #getHtml()} 获取页面的html内容
* {@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} 添加待抓取的链接
*
* </pre>
* @author code4crafter@gmail.com <br>
*/
*/
public
class
Page
{
public
class
Page
{
...
@@ -34,6 +40,10 @@ public class Page {
...
@@ -34,6 +40,10 @@ public class Page {
public
Page
()
{
public
Page
()
{
}
}
/**
*
* @return fields
*/
public
Map
<
String
,
Selectable
>
getFields
()
{
public
Map
<
String
,
Selectable
>
getFields
()
{
return
fields
;
return
fields
;
}
}
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/Request.java
View file @
fb0797b6
...
@@ -17,7 +17,7 @@ package us.codecraft.webmagic;
...
@@ -17,7 +17,7 @@ package us.codecraft.webmagic;
* String linktext = (String)page.getRequest().getExtra()[0];
* String linktext = (String)page.getRequest().getExtra()[0];
* }
* }
* </pre>
* </pre>
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午11:37
* Time: 上午11:37
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/Site.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import java.util.*;
...
@@ -4,7 +4,7 @@ import java.util.*;
/**
/**
* Site定义一个待抓取的站点的各种信息。
* Site定义一个待抓取的站点的各种信息。
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午12:13
* Time: 下午12:13
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/Spider.java
View file @
fb0797b6
...
@@ -14,7 +14,7 @@ import java.util.ArrayList;
...
@@ -14,7 +14,7 @@ import java.util.ArrayList;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午6:53
* Time: 上午6:53
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/Task.java
View file @
fb0797b6
package
us
.
codecraft
.
webmagic
;
package
us
.
codecraft
.
webmagic
;
/**
/**
*
Author: code4crafer@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-6-18
* Date: 13-6-18
* Time: 下午2:57
* Time: 下午2:57
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/downloader/Downloader.java
View file @
fb0797b6
...
@@ -6,7 +6,7 @@ import us.codecraft.webmagic.Site;
...
@@ -6,7 +6,7 @@ import us.codecraft.webmagic.Site;
/**
/**
* Downloader是webmagic下载页面的接口。webmagic默认使用了HttpComponent作为下载器,一般情况,你无需自己实现这个接口。
* Downloader是webmagic下载页面的接口。webmagic默认使用了HttpComponent作为下载器,一般情况,你无需自己实现这个接口。
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午12:14
* Time: 下午12:14
*/
*/
...
@@ -17,7 +17,7 @@ public interface Downloader {
...
@@ -17,7 +17,7 @@ public interface Downloader {
*
*
* @param request
* @param request
* @param site
* @param site
* @return
* @return
page
*/
*/
public
Page
download
(
Request
request
,
Site
site
);
public
Page
download
(
Request
request
,
Site
site
);
}
}
webmagic-core/src/main/java/us/codecraft/webmagic/downloader/HttpClientDownloader.java
View file @
fb0797b6
...
@@ -14,7 +14,7 @@ import us.codecraft.webmagic.utils.UrlUtils;
...
@@ -14,7 +14,7 @@ import us.codecraft.webmagic.utils.UrlUtils;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午12:15
* Time: 下午12:15
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/downloader/HttpClientPool.java
View file @
fb0797b6
...
@@ -18,7 +18,7 @@ import us.codecraft.webmagic.Site;
...
@@ -18,7 +18,7 @@ import us.codecraft.webmagic.Site;
import
java.util.Map
;
import
java.util.Map
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午12:29
* Time: 下午12:29
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/pipeline/ConsolePipeline.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.selector.Selectable;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.selector.Selectable;
import
java.util.Map
;
import
java.util.Map
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:45
* Time: 下午1:45
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/pipeline/FilePipeline.java
View file @
fb0797b6
...
@@ -12,7 +12,7 @@ import java.io.PrintWriter;
...
@@ -12,7 +12,7 @@ import java.io.PrintWriter;
import
java.util.Map
;
import
java.util.Map
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午6:28
* Time: 下午6:28
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/pipeline/Pipeline.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import us.codecraft.webmagic.Page;
...
@@ -4,7 +4,7 @@ import us.codecraft.webmagic.Page;
import
us.codecraft.webmagic.Task
;
import
us.codecraft.webmagic.Task
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:39
* Time: 下午1:39
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/processor/PageProcessor.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import us.codecraft.webmagic.Page;
...
@@ -4,7 +4,7 @@ import us.codecraft.webmagic.Page;
import
us.codecraft.webmagic.Site
;
import
us.codecraft.webmagic.Site
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午11:42
* Time: 上午11:42
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/processor/SimplePageProcessor.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.utils.UrlUtils;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.utils.UrlUtils;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-22
* Date: 13-4-22
* Time: 下午9:15
* Time: 下午9:15
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/schedular/FileCacheQueueSchedular.java
View file @
fb0797b6
...
@@ -16,7 +16,7 @@ import java.util.concurrent.atomic.AtomicBoolean;
...
@@ -16,7 +16,7 @@ import java.util.concurrent.atomic.AtomicBoolean;
import
java.util.concurrent.atomic.AtomicInteger
;
import
java.util.concurrent.atomic.AtomicInteger
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:13
* Time: 下午1:13
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/schedular/QueueSchedular.java
View file @
fb0797b6
...
@@ -10,7 +10,7 @@ import java.util.concurrent.BlockingQueue;
...
@@ -10,7 +10,7 @@ import java.util.concurrent.BlockingQueue;
import
java.util.concurrent.LinkedBlockingQueue
;
import
java.util.concurrent.LinkedBlockingQueue
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:13
* Time: 下午1:13
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/schedular/Schedular.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import us.codecraft.webmagic.Request;
...
@@ -4,7 +4,7 @@ import us.codecraft.webmagic.Request;
import
us.codecraft.webmagic.Task
;
import
us.codecraft.webmagic.Task
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:12
* Time: 下午1:12
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/Html.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import java.util.ArrayList;
...
@@ -4,7 +4,7 @@ import java.util.ArrayList;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:54
* Time: 上午7:54
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/PlainText.java
View file @
fb0797b6
...
@@ -6,7 +6,7 @@ import java.util.ArrayList;
...
@@ -6,7 +6,7 @@ import java.util.ArrayList;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:54
* Time: 上午7:54
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexResult.java
View file @
fb0797b6
package
us
.
codecraft
.
webmagic
.
selector
;
package
us
.
codecraft
.
webmagic
.
selector
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:39
* Time: 上午7:39
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/RegexSelector.java
View file @
fb0797b6
...
@@ -9,7 +9,7 @@ import java.util.regex.Pattern;
...
@@ -9,7 +9,7 @@ import java.util.regex.Pattern;
import
java.util.regex.PatternSyntaxException
;
import
java.util.regex.PatternSyntaxException
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:09
* Time: 上午7:09
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/ReplaceSelector.java
View file @
fb0797b6
...
@@ -6,7 +6,7 @@ import java.util.regex.Pattern;
...
@@ -6,7 +6,7 @@ import java.util.regex.Pattern;
import
java.util.regex.PatternSyntaxException
;
import
java.util.regex.PatternSyntaxException
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:09
* Time: 上午7:09
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/Selectable.java
View file @
fb0797b6
...
@@ -3,7 +3,7 @@ package us.codecraft.webmagic.selector;
...
@@ -3,7 +3,7 @@ package us.codecraft.webmagic.selector;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-20
* Date: 13-4-20
* Time: 下午7:51
* Time: 下午7:51
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/Selector.java
View file @
fb0797b6
...
@@ -3,7 +3,7 @@ package us.codecraft.webmagic.selector;
...
@@ -3,7 +3,7 @@ package us.codecraft.webmagic.selector;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-20
* Date: 13-4-20
* Time: 下午8:02
* Time: 下午8:02
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/SelectorFactory.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import java.util.Map;
...
@@ -7,7 +7,7 @@ import java.util.Map;
import
java.util.concurrent.ConcurrentHashMap
;
import
java.util.concurrent.ConcurrentHashMap
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:56
* Time: 上午7:56
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/SmartContentSelector.java
View file @
fb0797b6
...
@@ -10,7 +10,7 @@ import java.util.concurrent.atomic.AtomicInteger;
...
@@ -10,7 +10,7 @@ import java.util.concurrent.atomic.AtomicInteger;
/**
/**
* readability算法,基础是找到所有p标签的父节点
* readability算法,基础是找到所有p标签的父节点
* 写的比较乱,最终效果还在尝试中
* 写的比较乱,最终效果还在尝试中
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午4:42
* Time: 下午4:42
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/selector/XpathSelector.java
View file @
fb0797b6
...
@@ -6,7 +6,7 @@ import java.util.ArrayList;
...
@@ -6,7 +6,7 @@ import java.util.ArrayList;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午9:39
* Time: 上午9:39
*/
*/
...
...
webmagic-core/src/main/java/us/codecraft/webmagic/utils/UrlUtils.java
View file @
fb0797b6
...
@@ -6,7 +6,7 @@ import java.util.regex.Matcher;
...
@@ -6,7 +6,7 @@ import java.util.regex.Matcher;
import
java.util.regex.Pattern
;
import
java.util.regex.Pattern
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:52
* Time: 下午1:52
*/
*/
...
...
webmagic-core/src/test/java/us/codecraft/webmagic/HtmlTest.java
View file @
fb0797b6
...
@@ -5,7 +5,7 @@ import org.junit.Test;
...
@@ -5,7 +5,7 @@ import org.junit.Test;
import
us.codecraft.webmagic.selector.Html
;
import
us.codecraft.webmagic.selector.Html
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午8:42
* Time: 上午8:42
*/
*/
...
...
webmagic-core/src/test/java/us/codecraft/webmagic/selector/RegexSelectorTest.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import junit.framework.Assert;
...
@@ -4,7 +4,7 @@ import junit.framework.Assert;
import
org.junit.Test
;
import
org.junit.Test
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 上午7:13
* Time: 上午7:13
*/
*/
...
...
webmagic-core/src/test/java/us/codecraft/webmagic/selector/XpathSelectorTest.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import org.junit.Assert;
...
@@ -4,7 +4,7 @@ import org.junit.Assert;
import
org.junit.Test
;
import
org.junit.Test
;
/**
/**
*
Author: code4crafter@gmail.com
Date: 13-4-21 Time: 上午10:06
*
@author code4crafter@gmail.com <br>
Date: 13-4-21 Time: 上午10:06
*/
*/
public
class
XpathSelectorTest
{
public
class
XpathSelectorTest
{
...
...
webmagic-core/src/test/java/us/codecraft/webmagic/utils/UrlUtilsTest.java
View file @
fb0797b6
...
@@ -4,7 +4,7 @@ import org.junit.Assert;
...
@@ -4,7 +4,7 @@ import org.junit.Assert;
import
org.junit.Test
;
import
org.junit.Test
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午2:22
* Time: 下午2:22
*/
*/
...
...
webmagic-plugin/src/main/java/us/codecraft/webmagic/pipeline/FreemarkerPipeline.java
View file @
fb0797b6
...
@@ -13,7 +13,7 @@ import java.io.IOException;
...
@@ -13,7 +13,7 @@ import java.io.IOException;
import
java.io.PrintWriter
;
import
java.io.PrintWriter
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-6-8
* Date: 13-6-8
* Time: 下午9:00
* Time: 下午9:00
*/
*/
...
...
webmagic-plugin/src/test/java/us/codecraft/webmagic/FreemarkerPipelineTest.java
View file @
fb0797b6
...
@@ -6,7 +6,7 @@ import us.codecraft.webmagic.pipeline.FreemarkerPipeline;
...
@@ -6,7 +6,7 @@ import us.codecraft.webmagic.pipeline.FreemarkerPipeline;
import
java.io.IOException
;
import
java.io.IOException
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-6-9
* Date: 13-6-9
* Time: 上午7:14
* Time: 上午7:14
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/DiandianBlogProcessor.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午8:08
* Time: 下午8:08
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/DianpingProcessor.java
View file @
fb0797b6
...
@@ -8,7 +8,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -8,7 +8,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午8:08
* Time: 下午8:08
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/DiaoyuwengProcessor.java
View file @
fb0797b6
...
@@ -8,7 +8,7 @@ import us.codecraft.webmagic.selector.PlainText;
...
@@ -8,7 +8,7 @@ import us.codecraft.webmagic.selector.PlainText;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午8:08
* Time: 下午8:08
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/F58PageProcesser.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:48
* Time: 下午1:48
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/HuxiuProcessor.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午8:08
* Time: 下午8:08
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/KaichibaProcessor.java
View file @
fb0797b6
...
@@ -5,7 +5,7 @@ import us.codecraft.webmagic.Site;
...
@@ -5,7 +5,7 @@ import us.codecraft.webmagic.Site;
import
us.codecraft.webmagic.processor.PageProcessor
;
import
us.codecraft.webmagic.processor.PageProcessor
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-5-20
* Date: 13-5-20
* Time: 下午5:31
* Time: 下午5:31
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/MeicanProcessor.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-5-20
* Date: 13-5-20
* Time: 下午5:31
* Time: 下午5:31
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/NjuBBSProcessor.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午8:08
* Time: 下午8:08
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/OschinaBlogPageProcesser.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:48
* Time: 下午1:48
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/OschinaPageProcesser.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:48
* Time: 下午1:48
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/QzoneBlogProcessor.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午8:08
* Time: 下午8:08
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/SinaBlogProcesser.java
View file @
fb0797b6
...
@@ -5,7 +5,7 @@ import us.codecraft.webmagic.Page;
...
@@ -5,7 +5,7 @@ import us.codecraft.webmagic.Page;
import
us.codecraft.webmagic.processor.PageProcessor
;
import
us.codecraft.webmagic.processor.PageProcessor
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:48
* Time: 下午1:48
*/
*/
...
...
webmagic-samples/src/main/java/us/codecraft/webmagic/samples/TianyaPageProcesser.java
View file @
fb0797b6
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
...
@@ -7,7 +7,7 @@ import us.codecraft.webmagic.processor.PageProcessor;
import
java.util.List
;
import
java.util.List
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-21
* Date: 13-4-21
* Time: 下午1:48
* Time: 下午1:48
*/
*/
...
...
webmagic-samples/src/test/java/us/codecraft/webmagic/SpiderTest.java
View file @
fb0797b6
...
@@ -8,7 +8,7 @@ import us.codecraft.webmagic.samples.HuxiuProcessor;
...
@@ -8,7 +8,7 @@ import us.codecraft.webmagic.samples.HuxiuProcessor;
import
us.codecraft.webmagic.schedular.FileCacheQueueSchedular
;
import
us.codecraft.webmagic.schedular.FileCacheQueueSchedular
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-4-20
* Date: 13-4-20
* Time: 下午7:46
* Time: 下午7:46
*/
*/
...
...
webmagic-samples/src/test/java/us/codecraft/webmagic/processor/DiandianProcessorTest.java
View file @
fb0797b6
...
@@ -11,7 +11,7 @@ import us.codecraft.webmagic.schedular.FileCacheQueueSchedular;
...
@@ -11,7 +11,7 @@ import us.codecraft.webmagic.schedular.FileCacheQueueSchedular;
import
java.io.IOException
;
import
java.io.IOException
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-6-9
* Date: 13-6-9
* Time: 上午8:02
* Time: 上午8:02
*/
*/
...
...
webmagic-samples/src/test/java/us/codecraft/webmagic/processor/DiaoyuwengProcessorTest.java
View file @
fb0797b6
...
@@ -11,7 +11,7 @@ import us.codecraft.webmagic.schedular.FileCacheQueueSchedular;
...
@@ -11,7 +11,7 @@ import us.codecraft.webmagic.schedular.FileCacheQueueSchedular;
import
java.io.IOException
;
import
java.io.IOException
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-6-9
* Date: 13-6-9
* Time: 上午8:02
* Time: 上午8:02
*/
*/
...
...
webmagic-samples/src/test/java/us/codecraft/webmagic/processor/SinablogProcessorTest.java
View file @
fb0797b6
...
@@ -11,7 +11,7 @@ import us.codecraft.webmagic.schedular.FileCacheQueueSchedular;
...
@@ -11,7 +11,7 @@ import us.codecraft.webmagic.schedular.FileCacheQueueSchedular;
import
java.io.IOException
;
import
java.io.IOException
;
/**
/**
*
Author: code4crafter@gmail.com
*
@author code4crafter@gmail.com <br>
* Date: 13-6-9
* Date: 13-6-9
* Time: 上午8:02
* Time: 上午8:02
*/
*/
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment