有很多情况,尤其是爬虫方向经常会有些小任务,获取某某数据,但是也是临时的,去一个一个字段的解析,有很费解,当然有些网站提供些数据,但是让人很无语,收费
现在提供通用json 转成excel表格的方法
# -*- coding: utf-8 -*-
"""
品牌排行榜
Create by scz on 2022-01-07.
"""
import json
import tablib
import requests
from utils import get_logger
from storage.databases.defaultdata_txt import DefaultData
defauledata = DefaultData()
session = requests.Session()
logger = get_logger('StartMap')
datass = []
Headerss = []
class StarMapScheduler():
def __init__(self):
super(StarMapScheduler, self).__init__()
def service(self,url,time_string):
global Headerss
sql = 'select id,userId,jsontxt from basic_yien_json where crawler_time>%s and id > 1 and url =%s'
# time_string = ['2021-12-01 00:18:00']
req_info = [time_string,url]
id_list = defauledata.get_data(sql, req_info)
print(len(id_list))
for id in id_list:
# print('In the %s ' % id[0])
userId = id[1]
jsontxt = id[2]
print(userId)
print(jsontxt)
data = json.loads(jsontxt)
if (data['Code']==200):
ItemList = data['Data']['ItemList']
print(datass)
print(Headerss)
Headerss = tuple([i for i in ItemList[0].keys()])
print(Headerss)
for datai in ItemList:
print(datai)
body = []
for v in datai.values():
body.append(str(v))
print(body)
datass.append(tuple(body))
if __name__ == '__main__':
scheduler = StarMapScheduler()
url = 'https://xsapi.endata.com.cn/v1/Blogger/BusinessBloggerList'
time_string = '2021-12-01 00:18:00'
scheduler.service(url = url,time_string = time_string)
print(datass)
print(Headerss)
datass = tablib.Dataset(*datass, headers=Headerss)
open('data.xls', 'wb').write(datass.xls)
这样设计有几个优点
1 从某库里取,如果出错了,可以无限试错
2 尽量加上入库时间,
3 global公用方法点使用Python3知识点:global关键字_qew110123的专栏-CSDN博客
4 json不用一个个的解析,批量处理
5首行的命名后期可对
缺点
1 因为值只针对列表,如果内容是有json的,现在是强制转换字符串型
2 如果内容位置更改,需要进行代码更新