Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
c23627bf
Commit
c23627bf
authored
Jan 16, 2017
by
xbynet@outlook.com
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
解决post/redirect/post 302跳转问题
parent
7ca4ed04
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
47 additions
and
2 deletions
+47
-2
CustomRedirectStrategy.java
...codecraft/webmagic/downloader/CustomRedirectStrategy.java
+44
-0
HttpClientGenerator.java
...us/codecraft/webmagic/downloader/HttpClientGenerator.java
+3
-2
No files found.
webmagic-core/src/main/java/us/codecraft/webmagic/downloader/CustomRedirectStrategy.java
0 → 100644
View file @
c23627bf
package
us
.
codecraft
.
webmagic
.
downloader
;
import
java.net.URI
;
import
org.apache.http.HttpRequest
;
import
org.apache.http.HttpResponse
;
import
org.apache.http.ProtocolException
;
import
org.apache.http.client.methods.HttpGet
;
import
org.apache.http.client.methods.HttpPost
;
import
org.apache.http.client.methods.HttpRequestWrapper
;
import
org.apache.http.client.methods.HttpUriRequest
;
import
org.apache.http.impl.client.LaxRedirectStrategy
;
import
org.apache.http.protocol.HttpContext
;
import
org.slf4j.Logger
;
import
org.slf4j.LoggerFactory
;
/**
*支持post 302跳转策略实现类
*HttpClient默认跳转:httpClientBuilder.setRedirectStrategy(new LaxRedirectStrategy());
*上述代码在post/redirect/post这种情况下不会传递原有请求的数据信息。所以参考了下SeimiCrawler这个项目的重定向策略。
*原代码地址:https://github.com/zhegexiaohuozi/SeimiCrawler/blob/master/project/src/main/java/cn/wanghaomiao/seimi/http/hc/SeimiRedirectStrategy.java
*/
public
class
CustomRedirectStrategy
extends
LaxRedirectStrategy
{
private
Logger
logger
=
LoggerFactory
.
getLogger
(
getClass
());
@Override
public
HttpUriRequest
getRedirect
(
HttpRequest
request
,
HttpResponse
response
,
HttpContext
context
)
throws
ProtocolException
{
URI
uri
=
getLocationURI
(
request
,
response
,
context
);
String
method
=
request
.
getRequestLine
().
getMethod
();
if
(
"post"
.
equalsIgnoreCase
(
method
))
{
try
{
HttpRequestWrapper
httpRequestWrapper
=
(
HttpRequestWrapper
)
request
;
httpRequestWrapper
.
setURI
(
uri
);
httpRequestWrapper
.
removeHeaders
(
"Content-Length"
);
return
httpRequestWrapper
;
}
catch
(
Exception
e
)
{
logger
.
error
(
"强转为HttpRequestWrapper出错"
);
}
return
new
HttpPost
(
uri
);
}
else
{
return
new
HttpGet
(
uri
);
}
}
}
webmagic-core/src/main/java/us/codecraft/webmagic/downloader/HttpClientGenerator.java
View file @
c23627bf
...
@@ -89,8 +89,9 @@ public class HttpClientGenerator {
...
@@ -89,8 +89,9 @@ public class HttpClientGenerator {
}
}
});
});
}
}
//解决post/redirect/post 302跳转问题
httpClientBuilder
.
setRedirectStrategy
(
new
CustomRedirectStrategy
());
SocketConfig
socketConfig
=
SocketConfig
.
custom
().
setSoTimeout
(
site
.
getTimeOut
()).
setSoKeepAlive
(
true
).
setTcpNoDelay
(
true
).
build
();
SocketConfig
socketConfig
=
SocketConfig
.
custom
().
setSoTimeout
(
site
.
getTimeOut
()).
setSoKeepAlive
(
true
).
setTcpNoDelay
(
true
).
build
();
httpClientBuilder
.
setDefaultSocketConfig
(
socketConfig
);
httpClientBuilder
.
setDefaultSocketConfig
(
socketConfig
);
connectionManager
.
setDefaultSocketConfig
(
socketConfig
);
connectionManager
.
setDefaultSocketConfig
(
socketConfig
);
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment