GNU parallel

Tags: parallel, bash

Intro

GNU parallel (commonly referred to as just parallel) is a command line tool that allows you to run multiple independent tasks at the same time by using the available cores on a computer.

This post will cover:

  1. How to use parallel
  2. Some useful features provided by parallel

Basic usage

The basic syntax of a parallel command is:


	parallel "<command> {}" ::: <input>

where <command> is the command you want to run in parallel for all the <input> you provide. The {} represents where in the <command> the <input> is substituted.

For example:


	parallel "echo my name is {}" ::: riku murray tanya ruth mandy

will substitute {} with the names provided after the :::, and output the following (in whichever order parallel will output):


	my name is riku
	my name is murray
	my name is tanya
	my name is ruth
	my name is mandy

You can use {} as many times as you like:


	parallel "echo {}'s name is {}" ::: riku murray tanya ruth mandy

will output:


	riku's name is riku
	murray's name is murray
	tanya's name is tanya
	ruth's name is ruth
	mandy's name is mandy

Strictly speaking, quote marks are not necessary, but recommended (by me).

Additional features

Multiple inputs

If you have multiple sets of inputs you want to parallelise, you can add another set of input with ::: <input> after the first input, and then use a numbered {} to specify which input you want to use.

For example:


	parallel "touch {1}_{2}.txt" ::: 1 2 3 ::: A B C

will output the following files:


	1_A.txt
	1_B.txt
	1_C.txt
	2_A.txt
	2_B.txt
	2_C.txt
	3_A.txt
	3_B.txt
	3_C.txt

Note that parallel generates all possible combinations from the given inputs.

xapply option

There are situations where you don’t want to run the commands for all the combinations from your inputs, but rather run commands for the first item from each input together, then the second item, and so on. For example, echoing first and last name:


	parallel "echo {1} {2}" ::: riku murray tanya ::: takei cadzow major

This will generate:


	riku takei
	riku cadzow
	riku major
	murray takei
	murray cadzow
	murray major
	tanya takei
	tanya cadzow
	tanya major

which is not what we want parallel to do, because we want to associate the first item of first input with the first item of second input. To do this correctly, you can supply the --xapply option to parallel:


	parallel --xapply "echo {1} {2}" ::: riku murray tanya ::: takei cadzow major

which will generate:


	riku takei
	murray cadzow
	tanya major

Job control

By default, parallel will use as many cores available on the computer to run the jobs. This may be problematic in situations where you want certain number of cores free for other programs to run, or to prevent the jobs using up all the memory by accident. You can specify the number of jobs to run with -j:


	parallel -j <N> "<command> {}" ::: <input>

The <N> is the number of concurrent jobs you want parallel to run at the same time. Below is a guide on some of the values you can provide to the -j option:

Value Meaning
N Run up to N jobs at a time
-N Run total number of cores minus N jobs
N% Run as many jobs as N percent of the total number of cores
file Run as many jobs specified in a file (values as above)

Input manipulation

When you pass file names (with or without absolute paths) to parallel, it is sometimes desirable to edit the file name in order to build outputs based on the input. For example, if you have an input.txt file and want an input.result.txt as an output, you can do the following to achieve this result:


	parallel "<command> {} > {.}.result.txt" ::: $(ls /path/to/files.txt)

The extra . within the {} tells parallel to remove the extension of the input. Since the rest of the path is unchanged, *.result.txt file will appear in the same directory as the input (in this case, /path/to/).

Here is a table of some of the things you can do to the input line:

Value Meaning
{.} File name without extension
{/} Basename of input (i.e. file name without the directory)
{//} Directory of input without the file name
{/.} Basename of input without extension

If you add a number to any of the above options, it will manipulate the Nth input. For example, {2.} will remove the file extension from the second input.