问题：

python3.x - 怎么解决python 爬虫运行多进程报错:TypeError: cannot pickle '_thread.lock' object?

华易安

2024-03-07

python 爬虫运行多进程报错:TypeError: cannot pickle '_thread.lock' object

# coding=utf-8"""    @project: 15python_spider    @Author：frank    @file： 01_xiaomi_app.py    @date：2024/3/7 19:52"""import jsonimport timefrom multiprocessing import Processfrom queue import Queueimport requestsclass XiaomiSpider(object):    def __init__(self):        self.url = 'http://app.mi.com/categotyAllListApi?page={}&categoryId=2&pageSize=30'        self.headers = {'User-Agent': 'Mozilla/5.0'}        # url队列        self.url_queue = Queue()        self.n = 0        self.app_list = []    # URL入队列    def url_in(self):        for i in range(6):            url = self.url.format(i)            # 入队列            self.url_queue.put(url)    # 线程事件函数    def get_data(self):        while True:            # self.url_queue.empty() 为空，则退出执行            if self.url_queue.empty():                break            # get地址，请求+解析+保存            url = self.url_queue.get()            html = requests.get(                url=url,                headers=self.headers            ).content.decode('utf-8')            html = json.loads(html)            # 解析数据            for app in html['data']:                # 应用名称                app_name = app['displayName']                app_link = 'https://app.mi.com/details?id={}'.format(app['packageName'])                app_info = {                    'app_name': app_name,                    'app_link': app_link                }                self.app_list.append(app_info)                self.n += 1            print(url)    # 主函数    def main(self):        # url 入队列        self.url_in()        t_list = []        for i in range(5):            t = Process(target=self.get_data)            t_list.append(t)            t.start()        for i in t_list:            i.join()        with open('app_list.json', 'w') as f:            json.dump(self.app_list, f, ensure_ascii=False)        print('应用数量:', self.n)if __name__ == "__main__":    start = time.time()    spider = XiaomiSpider()    spider.main()    end = time.time()    print('执行时间:%.2f' % (end - start))

怎么解决python 爬虫运行多进程报错:TypeError: cannot pickle '_thread.lock' object

共有1个答案

桂阳文

2024-03-07

你需要 from multiprocessing import Queue

类似资料：

python3.x - python mitmproxy高级爬虫问题,求解决?

我要把downstream_port传到tiktok_response_interceptor.py脚本，我目前的方法是 tiktok_response_interceptor-9092.py tiktok_response_interceptor-9093.py tiktok_response_interceptor-9094.py 然后文件中也写死这大概不是最好的方法
python3爬虫中多线程进行解锁操作实例

本文向大家介绍python3爬虫中多线程进行解锁操作实例，包括了python3爬虫中多线程进行解锁操作实例的使用技巧和注意事项，需要的朋友参考一下生活中我们为了保障房间里物品的安全，所以给门进行上锁，在我们需要进入房间的时候又会重新打开。同样的之间我们讲过多线程中的lock，作用是为了不让多个线程运行是出错所以进行锁住的指令。但是鉴于我们实际运用中，因为线程和指令不会只有一个，如果全部都进行lo
python爬虫 - python3 爬虫，请问这是什么编码？

原始content： decode('utf-8')报错： UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 1: invalid continuation byte decode('utf-8', 'ignore')： decode('gbk', 'ignore')： decode('utf-16', 'ig
Python多线程爬虫

主要内容：多线程使用流程,Queue队列模型,多线程爬虫案例网络爬虫程序是一种 IO 密集型程序，程序中涉及了很多网络 IO 以及本地磁盘 IO 操作，这些都会消耗大量的时间，从而降低程序的执行效率，而 Python 提供的多线程能够在一定程度上提升 IO 密集型程序的执行效率。如果想学习 Python 多进程、多线程以及 Python GIL 全局解释器锁的相关知识，可参考《Python并发编程教程》。多线程使用流程 Python 提供了两个支持多线
Python反爬虫伪装浏览器进行爬虫

本文向大家介绍Python反爬虫伪装浏览器进行爬虫，包括了Python反爬虫伪装浏览器进行爬虫的使用技巧和注意事项，需要的朋友参考一下对于爬虫中部分网站设置了请求次数过多后会封杀ip，现在模拟浏览器进行爬虫，也就是说让服务器认识到访问他的是真正的浏览器而不是机器操作简单的直接添加请求头，将浏览器的信息在请求数据时传入：打开浏览器--打开开发者模式--请求任意网站如下图：找到请求的的名字，打
python中执行spark算子总是报错，怎么解决？

python中执行spark算子总是报错，新手上路，请教各路大神，怎么解决？ 24/06/17 16:31:58 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.net.SocketException: Connection reset 24/06/17 16:31:58 WARN TaskSetManager: Lo

python3.x - 怎么解决python 爬虫运行多进程报错:TypeError: cannot pickle '_thread.lock' object?

共有1个答案

相关问答

相关文章

相关阅读

相关工具

相关文档