学习python和线程。我认为我的代码可以无限运行。帮我找出错误？

哈骞仕

2023-03-14

问题内容：

所以我现在就开始学习python，我绝对爱上它了。

我正在构建一个小型的Facebook数据抓取工具。基本上，它将使用Graph API并刮取指定数量用户的名字。它在单线程（或我猜没有线程）中可以正常工作。

我使用在线教程提出了以下多线程版本 （更新的代码） ：

import requests
import json
import time
import threading
import Queue

GraphURL = 'http://graph.facebook.com/'
first_names = {} # will store first names and their counts
queue = Queue.Queue()

def getOneUser(url):
    http_response = requests.get(url) # open the request URL
    if http_response.status_code == 200:
        data = http_response.text.encode('utf-8', 'ignore') # Get the text of response, and encode it
        json_obj = json.loads(data) # load it as a json object
        # name = json_obj['name']
        return json_obj['first_name']
        # last = json_obj['last_name']
    return None

class ThreadGet(threading.Thread):
    """ Threaded name scraper """
    def __init__(self, queue):
        threading.Thread.__init__(self)
        self.queue = queue

    def run(self):
        while True:
            #print 'thread started\n'
            url = GraphURL + str(self.queue.get())
            first = getOneUser(url) # get one user's first name
            if first is not None:
                if first_names.has_key(first): # if name has been encountered before
                    first_names[first] = first_names[first] + 1 # increment the count
                else:
                    first_names[first] = 1 # add the new name
            self.queue.task_done()
            #print 'thread ended\n'

def main():
    start = time.time()
    for i in range(6):
        t = ThreadGet(queue)
        t.setDaemon(True)
        t.start()

    for i in range(100):
        queue.put(i)

    queue.join()

    for name in first_names.keys():
        print name + ': ' + str(first_names[name])

    print '----------------------------------------------------------------'
    print '================================================================'
    # Print top first names
    for key in first_names.keys():
        if first_names[key] > 2:
            print key + ': ' + str(first_names[key])

    print 'It took ' + str(time.time()-start) + 's'

main()

老实说，我不了解代码的某些部分，但我掌握了主要思想。输出为空。我的意思是说外壳没有任何东西，所以我相信它可以继续运行。

所以我正在做的是queue用fb上的用户ID填充整数。然后使用每个ID来构建api调用URL。getOneUser一次返回一个用户的名称。该task（ID）被标记为“完成”，并且继续进行。

上面的代码有什么问题？

问题答案：

您的原始run函数仅处理队列中的一项。您总共只从队列中删除了5个项目。

通常run功能看起来像

run(self):
    while True:
         doUsefulWork()

也就是说，它们有一个循环，使重复工作得以完成。

[编辑] OP编辑的代码以包含此更改。

其他一些有用的尝试：

在run函数中添加一个打印语句：您会发现它仅被调用5次。
删除该queue.join()调用，这就是导致模块阻塞的原因，然后您将能够探查队列的状态。
把整个身体run变成一个功能。验证您可以单线程方式使用该功能以获得所需的结果，然后
仅使用一个工作线程尝试一下，然后最后
多个工作线程。

学习python和线程。我认为我的代码可以无限运行。帮我找出错误？

相关阅读

相关文章

相关问答

相关工具

相关文档