由于学习需要,需要从油管上下载批量视频作为深度学习的数据集.paper给的代码是用pytube下载,顺便学习记录一下下载过程.
pytube是基于Python的油管下载库,并且可以使用命令行下载(网上说的).paper给的代码是用命令行进行下载
os.system('pytube -e mp4 -r 360p -f ' + name + ' ' + lines[i].split(' ')[0])
试了一下总是提示pytube的参数错误,网上也没有什么有效的解决办法,甚至似乎没有人遇到过这个问题,难道这个方法被抛弃了?没有办法,只能改成用API下载.
由于需要下载的视频比较多,所以,我决定先下载单个视频测试一下.于是写了test.py,如下
from pytube import YouTube
URL=r'https://www.youtube.com/watch?v=ABl8-quPXcw'
path=r'/root/upload'
yt=YouTube(URL)
yt.streams.first().download(path)
提示错误如下:
Traceback (most recent call last):
File "test.py", line 4, in <module>
yt=YouTube(URL)
File "/usr/local/lib/python3.6/dist-packages/pytube/__main__.py", line 88, in __init__
self.prefetch_init()
File "/usr/local/lib/python3.6/dist-packages/pytube/__main__.py", line 97, in prefetch_init
self.init()
File "/usr/local/lib/python3.6/dist-packages/pytube/__main__.py", line 133, in init
mixins.apply_signature(self.player_config_args, fmt, self.js)
File "/usr/local/lib/python3.6/dist-packages/pytube/mixins.py", line 49, in apply_signature
signature = cipher.get_signature(js, stream['s'])
KeyError: 's'
出错原因是最近YouTube更改了URL的签名方式,以下是解决办法:
找到mixins.py文件,修改方式如下:将
if 'signature=' in url:
改为:
if ('signature=' in url) or ('&sig=' in url) or ('&lsig=' in url):
参考:https://stackoverflow.com/questions/56548629/pytube-v-9-5-0-signature-error-in-mixins-py
经过以上修改,可以下载单个视频,接着修改paper给的代码:
import os
from pytube import YouTube
from pytube.exceptions import VideoUnavailable
from urllib.error import HTTPError
from pytube.exceptions import RegexMatchError
f = open('train_partition.txt', 'r')
lines = f.readlines()
MAX_NUM_VIDS = 70000
missing=[]
for i in range(MAX_NUM_VIDS):
name = 'sports-1m_{0:09d}'.format(i)
try:
yt=YouTube(lines[i].split(' ')[0])
except VideoUnavailable:
missing.append(lines[i].split(' ')[0])
print(name+' missing!')
except RegexMatchError:
missing.append(lines[i].split(' ')[0])
print(name+' missing!')
else:
stream=yt.streams.filter(subtype='mp4',resolution='360p').first()
try:
stream.download('/root/upload/S1M',filename=name)
except HTTPError:
missing.append(lines[i].split(' ')[0])
print(name+' missing!')
else:
print(name+' downloaded successfully!')
for i in range(missing.length):
print(missing[i]+' faild!')
print ('done.')
pytube还是很强大很好用的,用的时候最好的方法还是查看文档和源码
文档地址:https://python-pytube.readthedocs.io/en/latest/user/quickstart.html
github地址:https://github.com/nficano/pytube