问题：

Python3建立连接失败socket.gaierror：名称或服务未知

屠泰平

2023-03-14

我正在尝试运行一个电子邮件收割机，当我在没有循环的情况下手动输入url时，我没有任何连接错误。

import re
import requests
import requests.exceptions
from urllib.parse import urlsplit
from collections import deque
from bs4 import BeautifulSoup


def email_harvest(starting_url):
    # starting url. replace google with your own url.
    #starting_url = 'http://www.miet.ac.in'
    print ('this is the starting urli '+starting_url)   
    #starting_url = website_url[i]
#   i += 1
    # a queue of urls to be crawled
    unprocessed_urls = deque([starting_url])

    # set of already crawled urls for email
    processed_urls = set()

    # a set of fetched emails
    emails = set()

    # process urls one by one from unprocessed_url queue until queue is empty
    while len(unprocessed_urls):

        # move next url from the queue to the set of processed urls
        url = unprocessed_urls.popleft()
        processed_urls.add(url)

        # extract base url to resolve relative links
        parts = urlsplit(url)
        base_url = "{0.scheme}://{0.netloc}".format(parts)
        path = url[:url.rfind('/')+1] if '/' in parts.path else url
        print (url)
        # get url's content
        #print("Crawling URL %s" % url)
        try:
            response = requests.get(url)
            print (response.status_code)
        except (requests.exceptions.MissingSchema, requests.exceptions.ConnectionError):
            # ignore pages with errors and continue with next url
            print ("error crawing " % url)
            continue

        # extract all email addresses and add them into the resulting set
        # You may edit the regular expression as per your requirement
        new_emails = set(re.findall(r"[a-z0-9\.\-+_]+@[a-z0-9\.\-+_]+\.[a-z]+", response.text, re.I))
        emails.update(new_emails)
        print(emails)
        # create a beutiful soup for the html document
        soup = BeautifulSoup(response.text, 'lxml')

        # Once this document is parsed and processed, now find and process all the anchors i.e. linked urls in this document
        for anchor in soup.find_all("a"):
            # extract link url from the anchor
            link = anchor.attrs["href"] if "href" in anchor.attrs else ''
            # resolve relative links (starting with /)
            if link.startswith('/'):
                link = base_url + link
            elif not link.startswith('http'):
                link = path + link
            # add the new url to the queue if it was not in unprocessed html" target="_blank">list nor in processed list yet
            if not link in unprocessed_urls and not link in processed_urls:
                unprocessed_urls.append(link)


website_url = tuple(open('text.txt','r'))
i = 0
while i < (len(website_url)+1):
    print (i)
    starting_url = 'http://'+ website_url[i]
    email_harvest(starting_url)
    i +=1

然而，当我从文件中加载url时，我得到以下错误“名称或服务错误”

回溯（最近一次调用）：文件“/usr/lib/python3/dist-packages/urllib3/connection.py”，第141行，在新连接（self.host，self.port），self.timeout，**extra_-kw）文件“/usr/lib/python3/dist-packages/urllib3/util/connection.py”，第60行，在socket.getaddrinfo（主机，端口，家族，socket.socket_流）：文件“/usr/lib/python3.6/socket.py”，第745行，在getaddrinfo中，用于_socket.getaddrinfo中的res（主机、端口、系列、类型、协议、标志）：socket.gaierro:[Errno-2]名称或服务未知

在处理上述异常时，发生了另一个异常：

回溯（最近一次调用）：文件“/usr/lib/python3/dist packages/urllib3/connectionpool.py”，第601行，在urlopen chunked=chunked）文件“/usr/lib/python3/dist packages/urllib3/connectionpool.py”中，第357行，在“发出请求连接请求”（方法，url，**httplib\u-request\u-kw）文件“/usr/lib/python3.6/http/client.py”，第1254行，在请求self.\u发送请求（方法、url、正文、标题、编码块）文件“/usr/lib/python3.6/http/client.py”，第1300行，在请求self.endheaders（正文，编码块=编码块）文件“/usr/lib/python3.6/http/client.py”，第1249行，在endheaders self.\u发送输出（消息体，编码块=编码块）文件“/usr/lib/python3.6/http/client.py”，在发送文件/usr/lib/python3.6/http/client.py中的第1036行，在发送self.connect（）文件/usr/lib/python3/dist packages/urllib3/connection.py中，在connect conn=self中的第166行，在connect conn=self中。在新的conn（）文件/usr/lib/python3/dist packages/lib3/connection.py中，第974行“，第150行，在_new_conn self中，”未能建立新连接：%s”%e）urllib3.exceptions.NewConnectionError:：未能建立新连接：[Errno-2]名称或服务未知

在处理上述异常时，发生了另一个异常：

回溯（最近一次调用）：文件“/usr/local/lib/python3.6/dist packages/requests/adapters.py”，第449行，在发送超时=超时文件“/usr/lib/python3/dist packages/urllib3/connectionpool.py”中，第639行，在urlopen\u stacktrace=sys.exc_info（）[2]）文件“/usr/lib/python3/dist packages/urllib3/util/retry.py”，第398行，在增量raise MaxRetryError（_pool，url，error或ResponseError（原因））urllib3.exceptions.MaxRetryError:HTTPConnectionPool（host='www.miet.ac.in

在处理上述异常时，发生了另一个异常：

Traceback（最近一次调用）：File"editog.py"，第39行，在email_harvest响应=requests.get（url）File"/usr/loce/lib/python3.6/dist-包/请求/api.py"，第75行，在get返回请求（'get'， url， params=params，**kwargs）File"/usr/local/lib/python3.6/dist-包/请求/api.py"，第60行，在请求返回session.request（method=method， url=url，**kwargs）File"/usr/local/lib/python3.6/dist-包/请求/sessions.py，第533行，在请求resp=self.send（prep，**send_kwargs）文件/usr/本地/lib/python3.6/dist-包/请求/sessions.py，第646行，在发送r=adapter.send（请求，**kwargs）文件"/usr/local/lib/python3.6/dist-包/请求/adapters.py"，第516行，在发送提高连接错误（e，请求=请求）requests.exceptions.连接错误： HTTPConnectionpool（host='www.miet.ac.in '， port=80）：最大重试超过url：/（造成NewConnectionError（'：未能建立新的连接：[Errno-2]名称或服务不知道'，））

注:

我不支持任何代理，没有过滤
互联网是稳定的

共有2个答案

秦俊友

2023-03-14

主机='www.miet.ac.in

问题在于字符串插值

蓟捷

2023-03-14

看起来连接正在尝试连接到无效的url。

HTTPConnectionPool（主机='www.miet.ac.in

此url（'www.miet.ac.in

如果它是有效的，你可以添加你没有循环吗？

Python3建立连接失败socket.gaierror：名称或服务未知

共有2个答案

相关问答

相关文章

相关阅读

相关工具

相关文档