Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
f946fcdf
Commit
f946fcdf
authored
Aug 17, 2013
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
backup docs in Chinese
parent
5cb45af3
Changes
68
Hide whitespace changes
Inline
Side-by-side
Showing
68 changed files
with
1601 additions
and
0 deletions
+1601
-0
Page-cmnt.xml
zh_docs/us/codecraft/webmagic/Page-cmnt.xml
+91
-0
PagedModel-cmnt.xml
zh_docs/us/codecraft/webmagic/PagedModel-cmnt.xml
+14
-0
Request-cmnt.xml
zh_docs/us/codecraft/webmagic/Request-cmnt.xml
+56
-0
ResultItems-cmnt.xml
zh_docs/us/codecraft/webmagic/ResultItems-cmnt.xml
+27
-0
Site-cmnt.xml
zh_docs/us/codecraft/webmagic/Site-cmnt.xml
+147
-0
Spider-cmnt.xml
zh_docs/us/codecraft/webmagic/Spider-cmnt.xml
+90
-0
Task-cmnt.xml
zh_docs/us/codecraft/webmagic/Task-cmnt.xml
+26
-0
Destroyable-cmnt.xml
...ocs/us/codecraft/webmagic/downloader/Destroyable-cmnt.xml
+14
-0
Downloader-cmnt.xml
zh_docs/us/codecraft/webmagic/downloader/Downloader-cmnt.xml
+32
-0
FileDownloader-cmnt.xml
.../us/codecraft/webmagic/downloader/FileDownloader-cmnt.xml
+14
-0
HttpClientDownloader-cmnt.xml
...decraft/webmagic/downloader/HttpClientDownloader-cmnt.xml
+23
-0
HttpClientPool-cmnt.xml
.../us/codecraft/webmagic/downloader/HttpClientPool-cmnt.xml
+13
-0
package.cmnt
zh_docs/us/codecraft/webmagic/downloader/package.cmnt
+12
-0
AfterExtractor-cmnt.xml
zh_docs/us/codecraft/webmagic/model/AfterExtractor-cmnt.xml
+15
-0
ConsolePageModelPipeline-cmnt.xml
...odecraft/webmagic/model/ConsolePageModelPipeline-cmnt.xml
+13
-0
HasKey-cmnt.xml
zh_docs/us/codecraft/webmagic/model/HasKey-cmnt.xml
+24
-0
OOSpider-cmnt.xml
zh_docs/us/codecraft/webmagic/model/OOSpider-cmnt.xml
+22
-0
PageModelPipeline-cmnt.xml
...cs/us/codecraft/webmagic/model/PageModelPipeline-cmnt.xml
+13
-0
ComboExtract-cmnt.xml
...codecraft/webmagic/model/annotation/ComboExtract-cmnt.xml
+13
-0
ExtractBy-cmnt.xml
...us/codecraft/webmagic/model/annotation/ExtractBy-cmnt.xml
+45
-0
ExtractBy.Type-cmnt.xml
...decraft/webmagic/model/annotation/ExtractBy.Type-cmnt.xml
+6
-0
ExtractBy2-cmnt.xml
...s/codecraft/webmagic/model/annotation/ExtractBy2-cmnt.xml
+15
-0
ExtractBy2.Type-cmnt.xml
...ecraft/webmagic/model/annotation/ExtractBy2.Type-cmnt.xml
+6
-0
ExtractBy3-cmnt.xml
...s/codecraft/webmagic/model/annotation/ExtractBy3-cmnt.xml
+14
-0
ExtractBy3.Type-cmnt.xml
...ecraft/webmagic/model/annotation/ExtractBy3.Type-cmnt.xml
+6
-0
ExtractByRaw-cmnt.xml
...codecraft/webmagic/model/annotation/ExtractByRaw-cmnt.xml
+44
-0
ExtractByRaw.Type-cmnt.xml
...raft/webmagic/model/annotation/ExtractByRaw.Type-cmnt.xml
+6
-0
ExtractByUrl-cmnt.xml
...codecraft/webmagic/model/annotation/ExtractByUrl-cmnt.xml
+37
-0
HelpUrl-cmnt.xml
...s/us/codecraft/webmagic/model/annotation/HelpUrl-cmnt.xml
+28
-0
TargetUrl-cmnt.xml
...us/codecraft/webmagic/model/annotation/TargetUrl-cmnt.xml
+29
-0
package.cmnt
zh_docs/us/codecraft/webmagic/model/annotation/package.cmnt
+12
-0
package.cmnt
zh_docs/us/codecraft/webmagic/model/package.cmnt
+12
-0
package.cmnt
zh_docs/us/codecraft/webmagic/package.cmnt
+17
-0
ConsolePipeline-cmnt.xml
...s/us/codecraft/webmagic/pipeline/ConsolePipeline-cmnt.xml
+15
-0
FilePipeline-cmnt.xml
zh_docs/us/codecraft/webmagic/pipeline/FilePipeline-cmnt.xml
+27
-0
JsonFilePageModelPipeline-cmnt.xml
...raft/webmagic/pipeline/JsonFilePageModelPipeline-cmnt.xml
+28
-0
JsonFilePipeline-cmnt.xml
.../us/codecraft/webmagic/pipeline/JsonFilePipeline-cmnt.xml
+27
-0
PagedPipeline-cmnt.xml
...ocs/us/codecraft/webmagic/pipeline/PagedPipeline-cmnt.xml
+16
-0
Pipeline-cmnt.xml
zh_docs/us/codecraft/webmagic/pipeline/Pipeline-cmnt.xml
+14
-0
package.cmnt
zh_docs/us/codecraft/webmagic/pipeline/package.cmnt
+12
-0
PageProcessor-cmnt.xml
...cs/us/codecraft/webmagic/processor/PageProcessor-cmnt.xml
+27
-0
SimplePageProcessor-cmnt.xml
...codecraft/webmagic/processor/SimplePageProcessor-cmnt.xml
+14
-0
package.cmnt
zh_docs/us/codecraft/webmagic/processor/package.cmnt
+12
-0
FileCacheQueueScheduler-cmnt.xml
...craft/webmagic/scheduler/FileCacheQueueScheduler-cmnt.xml
+14
-0
QueueScheduler-cmnt.xml
...s/us/codecraft/webmagic/scheduler/QueueScheduler-cmnt.xml
+14
-0
RedisScheduler-cmnt.xml
...s/us/codecraft/webmagic/scheduler/RedisScheduler-cmnt.xml
+15
-0
Scheduler-cmnt.xml
zh_docs/us/codecraft/webmagic/scheduler/Scheduler-cmnt.xml
+29
-0
package.cmnt
zh_docs/us/codecraft/webmagic/scheduler/package.cmnt
+12
-0
AndSelector-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/AndSelector-cmnt.xml
+13
-0
CssSelector-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/CssSelector-cmnt.xml
+14
-0
Html-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/Html-cmnt.xml
+14
-0
JsonPathSelector-cmnt.xml
.../us/codecraft/webmagic/selector/JsonPathSelector-cmnt.xml
+13
-0
OrSelector-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/OrSelector-cmnt.xml
+13
-0
PlainText-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/PlainText-cmnt.xml
+14
-0
RegexSelector-cmnt.xml
...ocs/us/codecraft/webmagic/selector/RegexSelector-cmnt.xml
+14
-0
ReplaceSelector-cmnt.xml
...s/us/codecraft/webmagic/selector/ReplaceSelector-cmnt.xml
+14
-0
Selectable-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/Selectable-cmnt.xml
+75
-0
Selector-cmnt.xml
zh_docs/us/codecraft/webmagic/selector/Selector-cmnt.xml
+14
-0
SelectorFactory-cmnt.xml
...s/us/codecraft/webmagic/selector/SelectorFactory-cmnt.xml
+14
-0
SmartContentSelector-cmnt.xml
...codecraft/webmagic/selector/SmartContentSelector-cmnt.xml
+15
-0
XpathSelector-cmnt.xml
...ocs/us/codecraft/webmagic/selector/XpathSelector-cmnt.xml
+14
-0
package.cmnt
zh_docs/us/codecraft/webmagic/selector/package.cmnt
+12
-0
DoubleKeyMap-cmnt.xml
zh_docs/us/codecraft/webmagic/utils/DoubleKeyMap-cmnt.xml
+60
-0
FilePersistentBase-cmnt.xml
...s/us/codecraft/webmagic/utils/FilePersistentBase-cmnt.xml
+15
-0
MultiKeyMapBase-cmnt.xml
zh_docs/us/codecraft/webmagic/utils/MultiKeyMapBase-cmnt.xml
+13
-0
ThreadUtils-cmnt.xml
zh_docs/us/codecraft/webmagic/utils/ThreadUtils-cmnt.xml
+14
-0
UrlUtils-cmnt.xml
zh_docs/us/codecraft/webmagic/utils/UrlUtils-cmnt.xml
+22
-0
package.cmnt
zh_docs/us/codecraft/webmagic/utils/package.cmnt
+12
-0
No files found.
zh_docs/us/codecraft/webmagic/Page-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page]]>
</key>
<data>
<![CDATA[ <pre class="zh">
Page保存了上一次抓取的结果,并可定义待抓取的链接内容。
主要方法:
{@link #getUrl()} 获取页面的Url
{@link #getHtml()} 获取页面的html内容
{@link #putField(String, Object)} 保存抽取的结果
{@link #getResultItems()} 获取抽取的结果,在 {@link us.codecraft.webmagic.pipeline.Pipeline} 中调用
{@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} 添加待抓取的链接
</pre>
<pre
class=
"en"
>
Store extracted result and urls to be crawled.
Main method:
{@link #getUrl()} get url of current page
{@link #getHtml()} get content of current page
{@link #putField(String, Object)} save extracted result
{@link #getResultItems()} get extract results to be used in {@link us.codecraft.webmagic.pipeline.Pipeline}
{@link #addTargetRequests(java.util.List)} {@link #addTargetRequest(String)} add urls to crawl
</pre>
@author code4crafter@gmail.com
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.putField(java.lang.String, java.lang.Object)]]>
</key>
<data>
<![CDATA[
@param key 结果的key
@param field 结果的value
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.getHtml()]]>
</key>
<data>
<![CDATA[ 获取页面的html内容
@return html 页面的html内容
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.addTargetRequests(java.util.List<java.lang.String>)]]>
</key>
<data>
<![CDATA[ 添加待抓取的链接
@param requests 待抓取的链接
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.addTargetRequest(java.lang.String)]]>
</key>
<data>
<![CDATA[ 添加待抓取的链接
@param requestString 待抓取的链接
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.addTargetRequest(us.codecraft.webmagic.Request)]]>
</key>
<data>
<![CDATA[ 添加待抓取的页面,在需要传递附加信息时使用
@param request 待抓取的页面
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.getUrl()]]>
</key>
<data>
<![CDATA[ 获取页面的Url
@return url 当前页面的url,可用于抽取
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.setUrl(us.codecraft.webmagic.selector.Selectable)]]>
</key>
<data>
<![CDATA[ 设置url
@param url
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Page.getRequest()]]>
</key>
<data>
<![CDATA[ 获取抓取请求
@return request 抓取请求
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/PagedModel-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.PagedModel]]>
</key>
<data>
<![CDATA[ 实现此接口以进行支持爬虫分页抓取。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-4
<br>
Time: 下午5:18
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/Request-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Request]]>
</key>
<data>
<![CDATA[ <div class="zh">
Request对象封装了待抓取的url信息。
<br/>
在PageProcessor中,Request对象可以通过{@link us.codecraft.webmagic.Page#getRequest()} 获取。
<br/>
<br/>
Request对象包含一个extra属性,可以写入一些必须的上下文,这个特性在某些场合会有用。
<br/>
<pre>
Example:
抓取
<a
href=
"${link}"
>
${linktext}
</a>
时,希望提取链接link,并保存linktext的信息。
在上一个页面:
public void process(Page page){
Request request = new Request(link,linktext);
page.addTargetRequest(request)
}
在下一个页面:
public void process(Page page){
String linktext = (String)page.getRequest().getExtra()[0];
}
</pre>
</div>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午11:37
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Request(java.lang.String)]]>
</key>
<data>
<![CDATA[ 构建一个request对象
@param url 必须参数,待抓取的url
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Request.setPriority(double)]]>
</key>
<data>
<![CDATA[ 设置优先级,用于URL队列排序<br>
需扩展Scheduler
<br>
目前还没有对应支持优先级的Scheduler实现 =。=
<br>
@param priority 优先级,越大则越靠前
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Request.getUrl()]]>
</key>
<data>
<![CDATA[ 获取待抓取的url
@return url 待抓取的url
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/ResultItems-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.ResultItems]]>
</key>
<data>
<![CDATA[ 保存抽取结果的类,由PageProcessor处理得到,传递给{@link us.codecraft.webmagic.pipeline.Pipeline}进行持久化。<br>
@author code4crafter@gmail.com
<br>
Date: 13-7-25
<br>
Time: 下午12:20
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.ResultItems.isSkip()]]>
</key>
<data>
<![CDATA[ 是否忽略这个页面,用于pipeline来判断是否对这个页面进行处理
@return 是否忽略 true 忽略
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.ResultItems.setSkip(boolean)]]>
</key>
<data>
<![CDATA[ 设置是否忽略这个页面,用于pipeline来判断是否对这个页面进行处理
@param skip
@return this
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/Site-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site]]>
</key>
<data>
<![CDATA[ Site定义一个待抓取的站点的各种信息。<br>
这个类的所有getter方法,一般都只会被爬虫框架内部进行调用。
<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午12:13
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.me()]]>
</key>
<data>
<![CDATA[ 创建一个Site对象,等价于new Site()
@return 新建的对象
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.addCookie(java.lang.String, java.lang.String)]]>
</key>
<data>
<![CDATA[ 为这个站点添加一个cookie,可用于抓取某些需要登录访问的站点。这个cookie的域名与{@link #getDomain()}是一致的
@param name cookie的名称
@param value cookie的值
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.setUserAgent(java.lang.String)]]>
</key>
<data>
<![CDATA[ 为这个站点设置user-agent,很多网站都对user-agent进行了限制,不设置此选项可能会得到期望之外的结果。
@param userAgent userAgent
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getCookies()]]>
</key>
<data>
<![CDATA[ 获取已经设置的所有cookie
@return 已经设置的所有cookie
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getUserAgent()]]>
</key>
<data>
<![CDATA[ 获取已设置的user-agent
@return 已设置的user-agent
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getDomain()]]>
</key>
<data>
<![CDATA[ 获取已设置的domain
@return 已设置的domain
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.setDomain(java.lang.String)]]>
</key>
<data>
<![CDATA[ 设置这个站点所在域名,必须项。<br>
目前不支持多个域名的抓取。抓取多个域名请新建一个Spider。
@param domain 爬虫会抓取的域名
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.setCharset(java.lang.String)]]>
</key>
<data>
<![CDATA[ 设置页面编码,若不设置则自动根据Html meta信息获取。<br>
一般无需设置encoding,如果发现下载的结果是乱码,则可以设置此项。
<br>
@param charset 编码格式,主要是"utf-8"、"gbk"两种
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getCharset()]]>
</key>
<data>
<![CDATA[ 获取已设置的编码
@return 已设置的domain
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.setAcceptStatCode(java.util.Set<java.lang.Integer>)]]>
</key>
<data>
<![CDATA[ 设置可接受的http状态码,仅当状态码在这个集合中时,才会读取页面内容。<br>
默认为200,正常情况下,无须设置此项。
<br>
某些站点会错误的返回状态码,此时可以对这个选项进行设置。
<br>
@param acceptStatCode 可接受的状态码
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getAcceptStatCode()]]>
</key>
<data>
<![CDATA[ 获取可接受的状态码
@return 可接受的状态码
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getStartUrls()]]>
</key>
<data>
<![CDATA[ 获取初始页面的地址列表
@return 初始页面的地址列表
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.addStartUrl(java.lang.String)]]>
</key>
<data>
<![CDATA[ 增加初始页面的地址,可反复调用此方法增加多个初始地址。
@param startUrl 初始页面的地址
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.setSleepTime(int)]]>
</key>
<data>
<![CDATA[ 设置两次抓取之间的间隔,避免对目标站点压力过大(或者避免被防火墙屏蔽...)。
@param sleepTime 单位毫秒
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getSleepTime()]]>
</key>
<data>
<![CDATA[ 获取两次抓取之间的间隔
@return 两次抓取之间的间隔,单位毫秒
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.getRetryTimes()]]>
</key>
<data>
<![CDATA[ 获取重新下载的次数,默认为0
@return 重新下载的次数
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Site.setRetryTimes(int)]]>
</key>
<data>
<![CDATA[ 设置获取重新下载的次数,默认为0
@return this
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/Spider-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider]]>
</key>
<data>
<![CDATA[ <pre>
webmagic爬虫的入口类。
示例:
定义一个最简单的爬虫:
Spider.create(new SimplePageProcessor("http://my.oschina.net/", "http://my.oschina.net/*blog/*")).run();
使用FilePipeline保存结果到文件:
Spider.create(new SimplePageProcessor("http://my.oschina.net/", "http://my.oschina.net/*blog/*"))
.pipeline(new FilePipeline("/data/temp/webmagic/")).run();
使用FileCacheQueueScheduler缓存URL,关闭爬虫后下次自动从停止的页面继续抓取:
Spider.create(new SimplePageProcessor("http://my.oschina.net/", "http://my.oschina.net/*blog/*"))
.scheduler(new FileCacheQueueScheduler("/data/temp/webmagic/cache/")).run();
</pre>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午6:53
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider(us.codecraft.webmagic.processor.PageProcessor)]]>
</key>
<data>
<![CDATA[ 使用已定义的抽取规则新建一个Spider。
@param pageProcessor 已定义的抽取规则
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.create(us.codecraft.webmagic.processor.PageProcessor)]]>
</key>
<data>
<![CDATA[ 使用已定义的抽取规则新建一个Spider。
@param pageProcessor 已定义的抽取规则
@return 新建的Spider
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.startUrls(java.util.List<java.lang.String>)]]>
</key>
<data>
<![CDATA[ 重新设置startUrls,会覆盖Site本身的startUrls。
@param startUrls
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.setUUID(java.lang.String)]]>
</key>
<data>
<![CDATA[ 为爬虫设置一个唯一ID,用于标志任务,默认情况下使用domain作为uuid,对于单domain多任务的情况,请为重复任务设置不同的ID。
@param uuid 唯一ID
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.scheduler(us.codecraft.webmagic.scheduler.Scheduler)]]>
</key>
<data>
<![CDATA[ 设置调度器。调度器用于保存待抓取URL,并可以进行去重、同步、持久化等工作。默认情况下使用内存中的阻塞队列进行调度。
@param scheduler 调度器
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.pipeline(us.codecraft.webmagic.pipeline.Pipeline)]]>
</key>
<data>
<![CDATA[ 设置处理管道。处理管道用于最终抽取结果的后处理,例如:保存到文件、保存到数据库等。默认情况下会输出到控制台。
@param pipeline 处理管道
@return this
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.test(java.lang.String...)]]>
</key>
<data>
<![CDATA[ 用某些特定URL进行爬虫测试
@param urls 要抓取的url
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Spider.thread(int)]]>
</key>
<data>
<![CDATA[ 建立多个线程下载
@param threadNum 线程数
@return this
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/Task-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Task]]>
</key>
<data>
<![CDATA[ 抓取任务的抽象接口。<br>
@author code4crafter@gmail.com
<br>
Date: 13-6-18
Time: 下午2:57
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Task.getUUID()]]>
</key>
<data>
<![CDATA[ 返回唯一标志该任务的字符串,以区分不同任务。
@return uuid
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.Task.getSite()]]>
</key>
<data>
<![CDATA[ 返回任务抓取的站点信息
@return site
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/downloader/Destroyable-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.Destroyable]]>
</key>
<data>
<![CDATA[ 比较占用资源的服务可以实现该接口,Spider会在结束时调用destroy()释放资源。<br>
@author code4crafter@gmail.com
<br>
Date: 13-7-26
<br>
Time: 下午3:10
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/downloader/Downloader-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.Downloader]]>
</key>
<data>
<![CDATA[ Downloader是webmagic下载页面的接口。webmagic默认使用了HttpComponent作为下载器,一般情况,你无需自己实现这个接口。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午12:14
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.Downloader.download(us.codecraft.webmagic.Request, us.codecraft.webmagic.Task)]]>
</key>
<data>
<![CDATA[ 下载页面,并保存信息到Page对象中。
@param request
@param task
@return page
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.Downloader.setThread(int)]]>
</key>
<data>
<![CDATA[ 设置线程数,多线程程序一般需要Downloader支持<br>
如果不考虑多线程的可以不实现这个方法
<br>
@param thread 线程数量
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/downloader/FileDownloader-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.FileDownloader]]>
</key>
<data>
<![CDATA[ 使用缓存到本地的文件来模拟下载,可以在Spider框架中仅进行抽取工作。<br>
@author code4crafer@gmail.com
Date: 13-6-24
Time: 上午7:24
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/downloader/HttpClientDownloader-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.HttpClientDownloader]]>
</key>
<data>
<![CDATA[ 封装了HttpClient的下载器。已实现指定次数重试、处理gzip、自定义UA/cookie等功能。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午12:15
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.HttpClientDownloader.download(java.lang.String)]]>
</key>
<data>
<![CDATA[ 直接下载页面的简便方法
@param url
@return
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/downloader/HttpClientPool-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader.HttpClientPool]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-4-21
Time: 下午12:29
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/downloader/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.downloader]]>
</key>
<data>
<![CDATA[
包含了页面下载的接口Downloader和实现类HttpClientDownloader,该实现类封装了HttpComponent库。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/AfterExtractor-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.AfterExtractor]]>
</key>
<data>
<![CDATA[ 实现这个接口即可在抽取后进行后处理。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-3
<br>
Time: 上午9:42
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/ConsolePageModelPipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.ConsolePageModelPipeline]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-8-3
<br>
Time: 下午3:41
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/HasKey-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.HasKey]]>
</key>
<data>
<![CDATA[ 标志一个Model的key。<br>
实现了这个接口的Model在输出时会使用getKey()作为标志(例如JsonFilePageModelPipeline中持久化的文件名)。
<br>
如果持久化的文件名是乱码,请再运行的环境变量里加上LANG=zh_CN.UTF-8 。
<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-10
<br>
Time: 上午7:39
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.HasKey.key()]]>
</key>
<data>
<![CDATA[ 在输出时会使用key作为标志(例如JsonFilePageModelPipeline中持久化的文件名)。
@return key
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/OOSpider-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.OOSpider]]>
</key>
<data>
<![CDATA[ 基于Model的Spider,封装后的入口类。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-3
<br>
Time: 上午9:51
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.OOSpider(us.codecraft.webmagic.Site, us.codecraft.webmagic.model.PageModelPipeline, java.lang.Class...)]]>
</key>
<data>
<![CDATA[ 创建一个爬虫。<br>
@param site
@param pageModelPipeline
@param pageModels
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/PageModelPipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.PageModelPipeline]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-8-3
<br>
Time: 上午9:34
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ComboExtract-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ComboExtract]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-8-16
<br>
Time: 下午11:09
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractBy-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy]]>
</key>
<data>
<![CDATA[ 定义类或者字段的抽取规则。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy.value]]>
</key>
<data>
<![CDATA[ 抽取规则
@return 抽取规则
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy.type]]>
</key>
<data>
<![CDATA[ 抽取规则类型,支持XPath、Css selector、正则表达式,默认是XPath
@return 抽取规则类型
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy.notNull]]>
</key>
<data>
<![CDATA[ 是否是不能为空的关键字段,若notNull为true,则对应字段抽取不到时,丢弃整个类,默认为false
@return 是否是不能为空的关键字段
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy.multi]]>
</key>
<data>
<![CDATA[ 是否抽取多个结果<br>
用于字段时,需要List
<String>
来盛放结果
<br>
用于类时,表示单页抽取多个对象
<br>
@return 是否抽取多个结果
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractBy.Type-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractBy2-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy2]]>
</key>
<data>
<![CDATA[ 定义类或者字段的抽取规则,只能在Extract、ExtractByRaw之后使用。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractBy2.Type-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractBy3-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractBy3]]>
</key>
<data>
<![CDATA[ 定义类或者字段的抽取规则,只能在Extract、ExtractByRaw之后使用。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractBy3.Type-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractByRaw-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByRaw]]>
</key>
<data>
<![CDATA[ 对于在Class级别就使用过ExtractBy的类,在字段中想抽取全部内容可使用此方法。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByRaw.value]]>
</key>
<data>
<![CDATA[ 抽取规则
@return 抽取规则
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByRaw.type]]>
</key>
<data>
<![CDATA[ 抽取规则类型,支持XPath、Css selector、正则表达式,默认是XPath
@return 抽取规则类型
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByRaw.notNull]]>
</key>
<data>
<![CDATA[ 是否是不能为空的关键字段,若notNull为true,则对应字段抽取不到时,丢弃整个类,默认为false
@return 是否是不能为空的关键字段
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByRaw.multi]]>
</key>
<data>
<![CDATA[ 是否抽取多个结果<br>
需要List
<String>
来盛放结果
<br>
@return 是否抽取多个结果
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractByRaw.Type-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/ExtractByUrl-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByUrl]]>
</key>
<data>
<![CDATA[ 定义类或者字段的抽取规则(从url中抽取,只支持正则表达式)。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByUrl.value]]>
</key>
<data>
<![CDATA[ 抽取规则,支持正则表达式
@return 抽取规则
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByUrl.notNull]]>
</key>
<data>
<![CDATA[ 是否是不能为空的关键字段,若notNull为true,则对应字段抽取不到时,丢弃整个类,默认为false
@return 是否是不能为空的关键字段
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.ExtractByUrl.multi]]>
</key>
<data>
<![CDATA[ 是否抽取多个结果<br>
用于字段时,需要List
<String>
来盛放结果
<br>
用于类时,表示单页抽取多个对象
<br>
@return 是否抽取多个结果
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/HelpUrl-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.HelpUrl]]>
</key>
<data>
<![CDATA[ 定义辅助爬取的url。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.HelpUrl.value]]>
</key>
<data>
<![CDATA[ 某个类对应的URL规则列表<br>
webmagic对正则表达式进行了修改,"."仅表示字符"."而不代表任意字符,而"\*"则代表了".\*",例如"http://\*.oschina.net/\*"代表了oschina所有的二级域名下的URL。
<br>
@return 抽取规则
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.HelpUrl.sourceRegion]]>
</key>
<data>
<![CDATA[ 指定提取URL的区域(仅支持XPath)
@return 指定提取URL的区域
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/TargetUrl-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.TargetUrl]]>
</key>
<data>
<![CDATA[ 定义某个类抽取的范围和来源,sourceRegion可以用xpath语法限定抽取区域。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-1
<br>
Time: 下午8:40
<br>
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.TargetUrl.value]]>
</key>
<data>
<![CDATA[ 某个类对应的URL规则列表<br>
webmagic对正则表达式进行了修改,"."仅表示字符"."而不代表任意字符,而"\*"则代表了".\*",例如"http://\*.oschina.net/\*"代表了oschina所有的二级域名下的URL。
<br>
@return 抽取规则
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation.TargetUrl.sourceRegion]]>
</key>
<data>
<![CDATA[ 指定提取URL的区域(仅支持XPath)
@return 指定提取URL的区域
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/annotation/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model.annotation]]>
</key>
<data>
<![CDATA[
webmagic注解抓取方式所定义的注解。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/model/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.model]]>
</key>
<data>
<![CDATA[
webmagic对抓取器编写的面向模型(称为PageModel)的封装。基于POJO及注解即可实现一个PageProcessor。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic]]>
</key>
<data>
<![CDATA[
<div class="en">
Main class "Spider" and models.
</div>
<div
class=
"zh"
>
包括webmagic入口类Spider和一些数据传递的实体类。
</div>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/ConsolePipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.ConsolePipeline]]>
</key>
<data>
<![CDATA[ 命令行输出抽取结果。可用于测试。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午1:45
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/FilePipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.FilePipeline]]>
</key>
<data>
<![CDATA[ 持久化到文件的接口。
@author code4crafter@gmail.com <br>
Date: 13-4-21
Time: 下午6:28
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.FilePipeline()]]>
</key>
<data>
<![CDATA[ 新建一个FilePipeline,使用默认保存路径"/data/webmagic/"
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.FilePipeline(java.lang.String)]]>
</key>
<data>
<![CDATA[ 新建一个FilePipeline
@param path 文件保存路径
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/JsonFilePageModelPipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline]]>
</key>
<data>
<![CDATA[ JSON格式持久化到文件的接口。<br>
如果持久化的文件名是乱码,请再运行的环境变量里加上LANG=zh_CN.UTF-8。
<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午6:28
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline()]]>
</key>
<data>
<![CDATA[ 新建一个JsonFilePageModelPipeline,使用默认保存路径"/data/webmagic/"
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.JsonFilePageModelPipeline(java.lang.String)]]>
</key>
<data>
<![CDATA[ 新建一个JsonFilePageModelPipeline
@param path 文件保存路径
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/JsonFilePipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.JsonFilePipeline]]>
</key>
<data>
<![CDATA[ JSON格式持久化到文件的接口。
@author code4crafter@gmail.com <br>
Date: 13-4-21
Time: 下午6:28
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.JsonFilePipeline()]]>
</key>
<data>
<![CDATA[ 新建一个JsonFilePipeline,使用默认保存路径"/data/webmagic/"
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.JsonFilePipeline(java.lang.String)]]>
</key>
<data>
<![CDATA[ 新建一个JsonFilePipeline
@param path 文件保存路径
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/PagedPipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.PagedPipeline]]>
</key>
<data>
<![CDATA[ 用于实现分页的Pipeline。<br>
在使用redis做分布式爬虫时,请不要使用此功能。
<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-4
<br>
Time: 下午5:15
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/Pipeline-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline.Pipeline]]>
</key>
<data>
<![CDATA[ Pipeline是数据离线处理和持久化的接口。通过实现Pipeline以实现不同的持久化方式(例如保存到数据库)。
@author code4crafter@gmail.com <br>
Date: 13-4-21
Time: 下午1:39
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/pipeline/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.pipeline]]>
</key>
<data>
<![CDATA[
包含了处理页面抽取结果的接口Pipeline和它的几个实现类。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/processor/PageProcessor-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.processor.PageProcessor]]>
</key>
<data>
<![CDATA[ 定制爬虫的核心接口。通过实现PageProcessor可以实现一个定制的爬虫。<br>
extends the class to implements various spiders.
<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午11:42
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.processor.PageProcessor.process(us.codecraft.webmagic.Page)]]>
</key>
<data>
<![CDATA[ 定义如何处理页面,包括链接提取、内容抽取等。
@param page
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.processor.PageProcessor.getSite()]]>
</key>
<data>
<![CDATA[ 定义任务一些配置信息,例如开始链接、抓取间隔、自定义cookie、自定义UA等。
@return site
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/processor/SimplePageProcessor-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.processor.SimplePageProcessor]]>
</key>
<data>
<![CDATA[ 非常简单的抽取器。链接抽取使用定义的通配符,并保存抽取整个内容到content字段。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-22
Time: 下午9:15
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/processor/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.processor]]>
</key>
<data>
<![CDATA[
包含了封装页面处理逻辑的接口PageProcessor和一个实现类SimplePageProcessor。实现PageProcessor即可定制一个自己的爬虫。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/scheduler/FileCacheQueueScheduler-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler.FileCacheQueueScheduler]]>
</key>
<data>
<![CDATA[ 磁盘文件实现的url管理模块,可以保证在长时间执行的任务中断后,下次启动从中断位置重新开始。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午1:13
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/scheduler/QueueScheduler-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler.QueueScheduler]]>
</key>
<data>
<![CDATA[ 内存队列实现的线程安全Scheduler。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午1:13
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/scheduler/RedisScheduler-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler.RedisScheduler]]>
</key>
<data>
<![CDATA[ 使用redis管理url,构建一个分布式的爬虫。<br>
@author code4crafter@gmail.com
<br>
Date: 13-7-25
<br>
Time: 上午7:07
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/scheduler/Scheduler-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler.Scheduler]]>
</key>
<data>
<![CDATA[ 包含url管理和调度的接口。包括url抓取队列,url去重等功能。<br>
Scheduler的接口包含一个Task参数,该参数是为单Scheduler多Task预留的(Spider就是一个Task)。
<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午1:12
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler.Scheduler.push(us.codecraft.webmagic.Request, us.codecraft.webmagic.Task)]]>
</key>
<data>
<![CDATA[ 加入一个待抓取的链接
@param request 待抓取的链接
@param task 定义的任务,以满足单Scheduler多Task的情况
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler.Scheduler.poll(us.codecraft.webmagic.Task)]]>
</key>
<data>
<![CDATA[ 返回下一个要抓取的链接
@param task 定义的任务,以满足单Scheduler多Task的情况
@return 下一个要抓取的链接
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/scheduler/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.scheduler]]>
</key>
<data>
<![CDATA[
包含url管理和调度的接口Scheduler及它的几个实现类。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/AndSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.AndSelector]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-8-3
<br>
Time: 下午5:29
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/CssSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.CssSelector]]>
</key>
<data>
<![CDATA[ css风格的选择器。包装了Jsoup。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午9:39
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/Html-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Html]]>
</key>
<data>
<![CDATA[ 可抽取的html文本。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午7:54
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/JsonPathSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.JsonPathSelector]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-8-12
<br>
Time: 下午12:54
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/OrSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.OrSelector]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com <br>
Date: 13-8-3
<br>
Time: 下午5:29
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/PlainText-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.PlainText]]>
</key>
<data>
<![CDATA[ 可抽取的纯文本,不包括xpath和css selector实现。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午7:54
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/RegexSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.RegexSelector]]>
</key>
<data>
<![CDATA[ 正则表达式抽取器。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午7:09
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/ReplaceSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.ReplaceSelector]]>
</key>
<data>
<![CDATA[ 对文本进行替换。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午7:09
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/Selectable-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable]]>
</key>
<data>
<![CDATA[ 可进行抽取的文本。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-20
Time: 下午7:51
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.xpath(java.lang.String)]]>
</key>
<data>
<![CDATA[ select list with xpath
@param xpath
@return new Selectable after extract
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.$(java.lang.String)]]>
</key>
<data>
<![CDATA[ select list with css selector
@param selector css selector expression
@return new Selectable after extract
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.smartContent()]]>
</key>
<data>
<![CDATA[ select smart content with ReadAbility algorithm
@return content
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.links()]]>
</key>
<data>
<![CDATA[ select all links
@return all links
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.regex(java.lang.String)]]>
</key>
<data>
<![CDATA[ select list with regex
@param regex
@return new Selectable after extract
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.replace(java.lang.String, java.lang.String)]]>
</key>
<data>
<![CDATA[ replace with regex
@param regex
@param replacement
@return new Selectable after extract
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.toString()]]>
</key>
<data>
<![CDATA[ single string result
@return single string result
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selectable.all()]]>
</key>
<data>
<![CDATA[ multi string result
@return multi string result
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/Selector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.Selector]]>
</key>
<data>
<![CDATA[ 抽取器。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-20
Time: 下午8:02
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/SelectorFactory-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.SelectorFactory]]>
</key>
<data>
<![CDATA[ 产生selector的工厂。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午7:56
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/SmartContentSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.SmartContentSelector]]>
</key>
<data>
<![CDATA[ readability算法,基础是找到所有p标签的父节点
写的比较乱,最终效果还在尝试中
@author code4crafter@gmail.com <br>
Date: 13-4-21
Time: 下午4:42
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/XpathSelector-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector.XpathSelector]]>
</key>
<data>
<![CDATA[ xpath的选择器。包装了HtmlCleaner。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 上午9:39
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/selector/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.selector]]>
</key>
<data>
<![CDATA[
提供了便捷抽取页面内容的工具,对外核心接口是Selectable,内部抽取则是通过实现Selector来定制。
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/utils/DoubleKeyMap-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap]]>
</key>
<data>
<![CDATA[ @author code4crafter@gmail.com
Date Dec 14, 2012
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap(java.util.Map<K1, java.util.Map<K2, V>>, java.lang.Class<? extends java.util.Map>)]]>
</key>
<data>
<![CDATA[ init map with protoMapClass
@param protoMapClass
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap.get(K1)]]>
</key>
<data>
<![CDATA[ @param key
@return map
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap.get(K1, K2)]]>
</key>
<data>
<![CDATA[ @param key1
@param key2
@return value
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap.put(K1, java.util.Map<K2, V>)]]>
</key>
<data>
<![CDATA[ @param key1
@param submap
@return
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap.put(K1, K2, V)]]>
</key>
<data>
<![CDATA[ @param key1
@param key2
@param value
@return
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap.remove(K1, K2)]]>
</key>
<data>
<![CDATA[ @param key1
@param key2
@return
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.DoubleKeyMap.remove(K1)]]>
</key>
<data>
<![CDATA[ @param key1
@return
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/utils/FilePersistentBase-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.FilePersistentBase]]>
</key>
<data>
<![CDATA[ 文件持久化的基础类。<br>
@author code4crafter@gmail.com
<br>
Date: 13-8-11
<br>
Time: 下午4:21
<br>
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/utils/MultiKeyMapBase-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.MultiKeyMapBase]]>
</key>
<data>
<![CDATA[ multikey map, some basic objects *
@author yihua.huang
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/utils/ThreadUtils-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.ThreadUtils]]>
</key>
<data>
<![CDATA[ 线程工具类。<br>
@author code4crafer@gmail.com
Date: 13-6-23
Time: 下午7:11
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/utils/UrlUtils-cmnt.xml
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:46 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.UrlUtils]]>
</key>
<data>
<![CDATA[ url及html处理工具类。<br>
@author code4crafter@gmail.com
<br>
Date: 13-4-21
Time: 下午1:52
]]>
</data>
</comment>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils.UrlUtils.canonicalizeUrl(java.lang.String, java.lang.String)]]>
</key>
<data>
<![CDATA[ 将url想对地址转化为绝对地址
@param url url地址
@param refer url地址来自哪个页面
@return url绝对地址
]]>
</data>
</comment>
</javadoc>
zh_docs/us/codecraft/webmagic/utils/package.cmnt
0 → 100644
View file @
f946fcdf
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<javadoc>
<meta>
<date-generated>
Sat Aug 17 14:14:45 CST 2013
</date-generated>
</meta>
<comment>
<key>
<![CDATA[us.codecraft.webmagic.utils]]>
</key>
<data>
<![CDATA[
提供一些处理链接的静态工具类。
]]>
</data>
</comment>
</javadoc>
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment