google-api-python-client (googleapiclient) 设置proxy

充培
2023-12-01

项目需要使用 YouTube Data API v3 抓取 YouTube 视频中的评论,google 提供的 python 示例代码中使用的是 google-api-python-client 包,但是文档中并未说明如何启用代理,经过一上午的各种尝试,终于在看了源码后成功使用代理访问 API 接口。

失败尝试

用 requests 访问接口

既然 client 没有提供代理访问,那我就不用这个包了,直接用 requests 构造请求访问 API,然而,很遗憾,不是 SSL 错误就是返回 403。

设置 socket global proxy

issue #569

Hello, you know the google APIs are blocked in China, so we can only access these APIs by proxy, but I don’t know whether the python client support proxy setup? If yes, please tell me how. If no, could you please add the proxy feature? Thanks!

However, I can set up a global proxy manually by this way, but it a global proxy, other requests would go through with the proxy, which is unnecessary.

import socket
from httplib2 import socks
import google_auth_oauthlib.flow

Socks5 proxy
socket.socket = socks.socksocket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 1086)

flow = google_auth_oauthlib.flow.Flow.from_client_secrets_file(
        CLIENT_SECRETS_FILE, scopes=SCOPES)
flow.authorization_url(
        # Enable offline access so that you can refresh an access token without
        # re-prompting the user for permission. Recommended for web server apps.
        access_type='offline',
        # Enable incremental authorization. Recommended as a best practice.
        include_granted_scopes='true')

我的 socket 连接会 timeout,估计和代理服务器有关系吧,我对 socks5 不太了解,改了很久也没成功。

成功方法

既然都不行,就去看源码实现吧,想着要发送请求肯定会调用 urllib3 或者其他类似的包,应该可以找到突破口。

于是定位到 googleapiclient.discovery.build(),以下是该函数的部分源码:

def build(
    serviceName,
    version,
    http=None,
    discoveryServiceUrl=None,
    developerKey=None,
    model=None,
    requestBuilder=HttpRequest,
    credentials=None,
    cache_discovery=True,
    cache=None,
    client_options=None,
    adc_cert_path=None,
    adc_key_path=None,
    num_retries=1,
    static_discovery=None,
    always_use_jwt_access=False,
):

 """Construct a Resource for interacting with an API.

    Construct a Resource object for interacting with an API. The serviceName and
    version are the names from the Discovery service.

    Args:
      serviceName: string, name of the service.
      version: string, the version of the service.
      http: httplib2.Http, An instance of httplib2.Http or something that acts
        like it that HTTP requests will be made through.

这里就很明显了,我们需要注意的是 http 这个参数。

http: httplib2.Http, An instance of httplib2.Http or something that acts like it that HTTP requests will be made through.

解决方法也很显然了,传入一个设置了代理信息的httplib2.Http实例即可。

import httplib2
import googleapiclient.discovery

proxy_info = httplib2.ProxyInfo(proxy_type=httplib2.socks.PROXY_TYPE_HTTP, proxy_host="127.0.0.1", proxy_port=10809)
http = httplib2.Http(timeout=10, proxy_info=proxy_info, disable_ssl_certificate_validation=False)

youtube = googleapiclient.discovery.build(
        api_service_name, api_version, developerKey=DEVELOPER_KEY, http=http)
 类似资料: