Intro
GNU parallel (commonly referred to as just parallel
) is a command line tool that allows you to run multiple independent tasks at the same time by using the available cores on a computer.
This post will cover:
- How to use
parallel
- Some useful features provided by
parallel
Basic usage
The basic syntax of a parallel
command is:
parallel "<command> {}" ::: <input>
where <command>
is the command you want to run in parallel for all the <input>
you provide. The {}
represents where in the <command>
the <input>
is substituted.
For example:
parallel "echo my name is {}" ::: riku murray tanya ruth mandy
will substitute {}
with the names provided after the :::
, and output the following (in whichever order parallel will output):
my name is riku
my name is murray
my name is tanya
my name is ruth
my name is mandy
You can use {}
as many times as you like:
parallel "echo {}'s name is {}" ::: riku murray tanya ruth mandy
will output:
riku's name is riku
murray's name is murray
tanya's name is tanya
ruth's name is ruth
mandy's name is mandy
Strictly speaking, quote marks are not necessary, but recommended (by me).
Additional features
Multiple inputs
If you have multiple sets of inputs you want to parallelise, you can add another set of input with ::: <input>
after the first input, and then use a numbered {}
to specify which input you want to use.
For example:
parallel "touch {1}_{2}.txt" ::: 1 2 3 ::: A B C
will output the following files:
1_A.txt
1_B.txt
1_C.txt
2_A.txt
2_B.txt
2_C.txt
3_A.txt
3_B.txt
3_C.txt
Note that parallel
generates all possible combinations from the given inputs.
xapply option
There are situations where you don’t want to run the commands for all the combinations from your inputs, but rather run commands for the first item from each input together, then the second item, and so on. For example, echo
ing first and last name:
parallel "echo {1} {2}" ::: riku murray tanya ::: takei cadzow major
This will generate:
riku takei
riku cadzow
riku major
murray takei
murray cadzow
murray major
tanya takei
tanya cadzow
tanya major
which is not what we want parallel
to do, because we want to associate the first item of first input with the first item of second input. To do this correctly, you can supply the --xapply
option to parallel
:
parallel --xapply "echo {1} {2}" ::: riku murray tanya ::: takei cadzow major
which will generate:
riku takei
murray cadzow
tanya major
Job control
By default, parallel
will use as many cores available on the computer to run the jobs. This may be problematic in situations where you want certain number of cores free for other programs to run, or to prevent the jobs using up all the memory by accident. You can specify the number of jobs to run with -j
:
parallel -j <N> "<command> {}" ::: <input>
The <N>
is the number of concurrent jobs you want parallel
to run at the same time. Below is a guide on some of the values you can provide to the -j
option:
Value | Meaning |
---|---|
N |
Run up to N jobs at a time |
-N |
Run total number of cores minus N jobs |
N% |
Run as many jobs as N percent of the total number of cores |
file | Run as many jobs specified in a file (values as above) |
Input manipulation
When you pass file names (with or without absolute paths) to parallel
, it is sometimes desirable to edit the file name in order to build outputs based on the input. For example, if you have an input.txt
file and want an input.result.txt
as an output, you can do the following to achieve this result:
parallel "<command> {} > {.}.result.txt" ::: $(ls /path/to/files.txt)
The extra .
within the {}
tells parallel
to remove the extension of the input. Since the rest of the path is unchanged, *.result.txt
file will appear in the same directory as the input (in this case, /path/to/
).
Here is a table of some of the things you can do to the input line:
Value | Meaning |
---|---|
{.} |
File name without extension |
{/} |
Basename of input (i.e. file name without the directory) |
{//} |
Directory of input without the file name |
{/.} |
Basename of input without extension |
If you add a number to any of the above options, it will manipulate the N
th input. For example, {2.}
will remove the file extension from the second input.