百年古董代码,今天突然报了个错.
Invalid control character at: line 1 column
url = base_url.format(index)
# 组装header
header = {
"User-Agent": "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.59 Safari/537.36",
"Connection": "keep-alive",
"X-Requested-With": "XMLHttpRequest",
"Accept-Encoding": "gzip, deflate",
"Accept": "application/json, text/javascript, */*; q=0.01",
"Accept-Language": "en-US,en;q=0.8",
# "Cookie": cookie
}
try:
# 执行url访问;
response = do_get_no_proxy(url, header)
if response == "Error":
print "request 502: {}".format(index)
continue
resdata = StringIO.StringIO(response)
gzipper = gzip.GzipFile(fileobj=resdata).read()
res_json = json.loads(gzipper)
res = res_json['records']
except Exception, e:
logging.error(e)
continue
定位发现 错误为红色处.
错误原因是因为gzipper内容没有通过json语法检查,存在\r\n之类的内容.
需要添加一个参数: strict=False
改成 res_json = json.loads(gzipper, strict=False)无报错.
另外 学习到同系列内容:
存在二进制内容:
str = json.dumps(jsondata, encoding='latin1')
res_json = json.loads(strdata, encoding='latin1', strict=False)
纯文本:
str = json.dumps(jsondata, ensure_ascii=False)