>A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.
>A crawler framework. It covers the whole lifecycle of crawler: downloading, url management, content extraction and persistent. It can simply the development of a specific crawler.
...
@@ -8,9 +10,11 @@ webmagic
...
@@ -8,9 +10,11 @@ webmagic
* Simple core with high flexibility.
* Simple core with high flexibility.
* Simple API for html extracting.
* Simple API for html extracting.
* Annotation with POJO to customize a crawler, no configuration.
* Multi-thread and Distribution support.
* Multi-thread and Distribution support.
* Easy to be integrated.
* Easy to be integrated.
## Install:
## Install:
Clone the repo and build:
Clone the repo and build:
...
@@ -95,4 +99,4 @@ There are some samples in `webmagic-samples` package.
...
@@ -95,4 +99,4 @@ There are some samples in `webmagic-samples` package.
### Lisence:
### Lisence:
Lisenced under [Apache 2.0 lisence](http://opensource.org/licenses/Apache-2.0)
Lisenced under [Apache 2.0 lisence](http://opensource.org/licenses/Apache-2.0)