Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
W
webmagic
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
沈俊林
webmagic
Commits
32f1f2cf
Commit
32f1f2cf
authored
Jul 29, 2017
by
yihua.huang
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
#613 add charset to page
parent
65049bac
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
10 additions
and
15 deletions
+10
-15
HttpClientDownloader.java
...s/codecraft/webmagic/downloader/HttpClientDownloader.java
+10
-15
No files found.
webmagic-core/src/main/java/us/codecraft/webmagic/downloader/HttpClientDownloader.java
View file @
32f1f2cf
...
...
@@ -113,7 +113,11 @@ public class HttpClientDownloader extends AbstractDownloader {
Page
page
=
new
Page
();
page
.
setBytes
(
bytes
);
if
(!
request
.
isBinaryContent
()){
page
.
setRawText
(
getResponseContent
(
charset
,
contentType
,
bytes
));
if
(
charset
==
null
)
{
charset
=
getHtmlCharset
(
contentType
,
bytes
);
}
page
.
setCharset
(
charset
);
page
.
setRawText
(
new
String
(
bytes
,
charset
));
}
page
.
setUrl
(
new
PlainText
(
request
.
getUrl
()));
page
.
setRequest
(
request
);
...
...
@@ -125,21 +129,12 @@ public class HttpClientDownloader extends AbstractDownloader {
return
page
;
}
private
String
getResponseContent
(
String
charset
,
String
contentType
,
byte
[]
bytes
)
throws
IOException
{
private
String
getHtmlCharset
(
String
contentType
,
byte
[]
contentBytes
)
throws
IOException
{
String
charset
=
CharsetUtils
.
detectCharset
(
contentType
,
contentBytes
);
if
(
charset
==
null
)
{
String
htmlCharset
=
getHtmlCharset
(
contentType
,
bytes
);
if
(
htmlCharset
!=
null
)
{
return
new
String
(
bytes
,
htmlCharset
);
}
else
{
logger
.
warn
(
"Charset autodetect failed, use {} as charset. Please specify charset in Site.setCharset()"
,
Charset
.
defaultCharset
());
return
new
String
(
bytes
);
}
}
else
{
return
new
String
(
bytes
,
charset
);
charset
=
Charset
.
defaultCharset
().
name
();
logger
.
warn
(
"Charset autodetect failed, use {} as charset. Please specify charset in Site.setCharset()"
,
Charset
.
defaultCharset
());
}
}
private
String
getHtmlCharset
(
String
contentType
,
byte
[]
contentBytes
)
throws
IOException
{
return
CharsetUtils
.
detectCharset
(
contentType
,
contentBytes
);
return
charset
;
}
}
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment