gnu bash
GNU parallel
is a command line tool for running jobs in parallel.
parallel
is awesome and belongs in the toolbox of every programmer. But I found the docs a bit overwhelming at first. Fortunately, you can start being useful with parallel
with just a few basic commands.
parallel
非常棒,它属于每个程序员的工具箱。 但是我发现这些文档起初有点让人不知所措。 幸运的是,您只需几个基本命令就可以开始对parallel
有用。
parallel
如此有用? (Why is parallel
so useful?)Let’s compare sequential and parallel execution of the same compute-intensive task.
让我们比较一下同一计算密集型任务的顺序执行和并行执行。
Imagine you have a folder of .wav audio files to convert to .flac:
假设您有一个.wav音频文件文件夹,可以将其转换为.flac:
These are pretty big files, each one is at least a gigabyte.
这些都是非常大的文件,每个文件至少为1 GB。
We’ll use another great command line tool, ffmpeg, to convert the files. Here’s what we need to run for each file.
我们将使用另一个出色的命令行工具ffmpeg来转换文件。 这是我们需要为每个文件运行的内容。
ffmpeg -i audio1.wav audio1.flac
Let’s write a script to convert each one sequentially:
让我们编写一个脚本来依次转换每个脚本:
# convert.sh
ffmpeg -i audio1.wav audio1.flac
ffmpeg -i audio2.wav audio2.flac
ffmpeg -i audio3.wav audio3.flac
ffmpeg -i audio4.wav audio4.flac
ffmpeg -i audio5.wav audio5.flac
We can time the execution of a job by prepending time
when calling the script from the terminal. time
will print the real time elapsed during execution.
从终端调用脚本时,我们可以通过提前time
作业的执行time
。 time
将打印执行期间经过的实时时间。
time ./convert.sh
Our script finishes in a little over a minute.
我们的脚本会在一分钟多的时间内完成。
Not bad. But now let’s run it in parallel!
不错。 但是现在让我们并行运行它!
We don’t have to change anything about our script. With the -a
flag, we can pipe our script directly into parallel
. parallel
will run every line as a separate command.
我们不必更改脚本的任何内容。 使用-a
标志,我们可以将脚本直接传递给parallel
。 parallel
将作为单独的命令运行每一行。
parallel -a ./convert.sh
Using parallel
, our conversion ran in a little over half the time. Nice!
使用parallel
,我们的转换运行了一半以上的时间。 真好!
With only five files, this difference isn’t such a big deal. But with larger lists and longer tasks, we can save a lot of time with parallel
.
只有五个文件,这种差异不是什么大问题。 但是,使用更大的列表和更长的任务,我们可以使用parallel
节省很多时间。
I encountered parallel
while working with a data processing task that would likely have run for an hour or more if done sequentially. With parallel
, it took only a few minutes.
我在处理数据处理任务时遇到了parallel
处理,如果按顺序执行,则可能要运行一个小时或更长时间。 使用parallel
,只花了几分钟。
parallel
power also depends on your computer. My MacBook Pro’s Intel i7 has only 4 cores. Even this small task pushed them all to their limit:
parallel
电源还取决于您的计算机。 我的MacBook Pro的Intel i7只有4个核心。 即使这个小任务也将他们推到了极限:
More powerful computers might have processors with 8, 16, or even 32 cores, offering massive time-saving through parallelization of your jobs.
功能更强大的计算机可能具有8、16甚至32个内核的处理器,通过并行化工作可以节省大量时间。
parallel
有用 (Being Useful with parallel
)The other great benefit of parallel
is its brevity and simplicity. Let's start with a nasty Python script and convert it to a clean call to parallel
.
parallel
的另一个巨大好处是它的简洁和简单。 让我们从一个讨厌的Python脚本开始,并将其转换为对parallel
的干净调用。
Here’s a Python script to accomplish our audio file conversion:
这是完成我们的音频文件转换的Python脚本:
import subprocess
path = Path.home()/'my-data-here'
for audio_file in list(path.glob('*.wav')):
cmd = ['ffmpeg',
'-i',
str(audio_file),
f'{audio_file.name.split(".")[0]}.flac']
subprocess.run(cmd, stdout=subprocess.PIPE)
Yikes! That’s actually a lot of code to think about just to convert some files. (This takes about 1.2 minutes to run).
kes! 实际上,要转换某些文件需要考虑很多代码。 (这大约需要1.2分钟才能运行)。
Let’s convert our Python to parallel
.
让我们将Python转换为parallel
。
parallel -a
调用脚本 (Calling a script with parallel -a
)parallel -a your-script-here.sh
is the nice one-liner we used above to pipe in our bash script.
parallel -a your-script-here.sh
是我们上面用来在bash脚本中传递管道的很好的一类代码。
This is great but does require you to write out the bash script you want to execute. In our example, we still wrote out every individual call to ffmpeg
in convert.sh
.
这很棒,但是确实需要您写出要执行的bash脚本。 在我们的示例中,我们仍然在convert.sh
写出了对ffmpeg
每个单独调用。
parallel
(Pipes and String Interpolation with parallel
)Luckily, parallel
gives us a way to delete convert.sh
entirely.
幸运的是, parallel
我们提供了一种完全删除convert.sh
的方法。
Here’s all we have to run to accomplish our conversion:
这是我们完成转换所需要做的一切:
ls *.wav | parallel ffmpeg -i {} {.}.flac
Let’s break this down.
让我们分解一下。
We’re getting a list of all the .wav files in our directory with ls *.wav
. Then we’re piping (|
) that list to parallel
.
我们使用ls *.wav
获取目录中所有.wav文件的列表。 然后,我们将该列表用管道( |
)进行parallel
。
Parallel provides some useful ways to do string interpolation, so our file paths are input correctly.
并行提供了一些有用的方式进行字符串插值,因此我们的文件路径输入正确。
The first is {}
, which parallel
automatically replaces with one line from our input.
第一个是{}
它parallel
自动从我们的输入一行替换。
The second operator is {.}
, which will input one line but with any file extensions removed.
第二个运算符是{.}
,它将输入一行,但是没有任何文件扩展名。
If we expanded the command run by parallel
for our first line of input, we would see...
如果我们将parallel
运行的命令扩展为第一行输入,则会看到...
ffmpeg -i audio1.wav audio1.flac
Parallel
Args (Args with Parallel
)As it turns out, we don’t even need to pipe from ls
to complete our task. We can go simpler still:
事实证明,我们甚至不需要通过ls
来完成我们的任务。 我们可以更简单一些:
parallel ffmpeg -i {} {.}.flac ::: *.wav
Arguments passed to parallel
occur after the command and are separated by :::
. In this case, our argument is *.wav
, which will provide the list of all .wav files in our directory. These files become the input for our blazing-fast parallel
job.
传递给parallel
参数在命令之后出现,并由:::
分隔。 在这种情况下,我们的参数是*.wav
,它将提供目录中所有.wav文件的列表。 这些文件成为我们快速parallel
工作的输入。
Fun fact: parallel
was built by Ole Tange and published in 2011. According to him, you can use the tool for research without citing the source paper for the modest fee of 10,000 euros!
有趣的事实: parallel
由Ole Tange建造并于2011年出版。据他说,您可以使用该工具进行研究,而无需引用原始文件,而只需支付10,000欧元!
Thanks for reading!
谢谢阅读!
gnu bash