项目需要使用 YouTube Data API v3 抓取 YouTube 视频中的评论,google 提供的 python 示例代码中使用的是 google-api-python-client 包,但是文档中并未说明如何启用代理,经过一上午的各种尝试,终于在看了源码后成功使用代理访问 API 接口。
既然 client 没有提供代理访问,那我就不用这个包了,直接用 requests 构造请求访问 API,然而,很遗憾,不是 SSL 错误就是返回 403。
Hello, you know the google APIs are blocked in China, so we can only access these APIs by proxy, but I don’t know whether the python client support proxy setup? If yes, please tell me how. If no, could you please add the proxy feature? Thanks!
However, I can set up a global proxy manually by this way, but it a global proxy, other requests would go through with the proxy, which is unnecessary.
import socket
from httplib2 import socks
import google_auth_oauthlib.flow
Socks5 proxy
socket.socket = socks.socksocket
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, "127.0.0.1", 1086)
flow = google_auth_oauthlib.flow.Flow.from_client_secrets_file(
CLIENT_SECRETS_FILE, scopes=SCOPES)
flow.authorization_url(
# Enable offline access so that you can refresh an access token without
# re-prompting the user for permission. Recommended for web server apps.
access_type='offline',
# Enable incremental authorization. Recommended as a best practice.
include_granted_scopes='true')
我的 socket 连接会 timeout,估计和代理服务器有关系吧,我对 socks5 不太了解,改了很久也没成功。
既然都不行,就去看源码实现吧,想着要发送请求肯定会调用 urllib3 或者其他类似的包,应该可以找到突破口。
于是定位到 googleapiclient.discovery.build()
,以下是该函数的部分源码:
def build(
serviceName,
version,
http=None,
discoveryServiceUrl=None,
developerKey=None,
model=None,
requestBuilder=HttpRequest,
credentials=None,
cache_discovery=True,
cache=None,
client_options=None,
adc_cert_path=None,
adc_key_path=None,
num_retries=1,
static_discovery=None,
always_use_jwt_access=False,
):
"""Construct a Resource for interacting with an API.
Construct a Resource object for interacting with an API. The serviceName and
version are the names from the Discovery service.
Args:
serviceName: string, name of the service.
version: string, the version of the service.
http: httplib2.Http, An instance of httplib2.Http or something that acts
like it that HTTP requests will be made through.
这里就很明显了,我们需要注意的是 http 这个参数。
http: httplib2.Http, An instance of httplib2.Http or something that acts like it that HTTP requests will be made through.
解决方法也很显然了,传入一个设置了代理信息的httplib2.Http
实例即可。
import httplib2
import googleapiclient.discovery
proxy_info = httplib2.ProxyInfo(proxy_type=httplib2.socks.PROXY_TYPE_HTTP, proxy_host="127.0.0.1", proxy_port=10809)
http = httplib2.Http(timeout=10, proxy_info=proxy_info, disable_ssl_certificate_validation=False)
youtube = googleapiclient.discovery.build(
api_service_name, api_version, developerKey=DEVELOPER_KEY, http=http)