Parallelize a Bash FOR Loop

I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven’t been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.

#!/bin/bash

kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do 
  fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
  fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
  fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
  fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear

  rm -f *.mat
done
Asked By: Ravnoor S Gill

||
for stuff in things
do
( something
  with
  stuff ) &
done
wait # for all the something with stuff

Whether it actually works depends on your commands; I’m not familiar with them. The rm *.mat looks a bit prone to conflicts if it runs in parallel…

Answered By: frostschutz

Why don’t you just fork (aka. background) them?

foo () {
    local run=$1
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

for run in $runList; do foo "$run" & done

In case that’s not clear, the significant part is here:

for run in $runList; do foo "$run" & done
                                   ^

Causing the function to be executed in a forked shell in the background. That’s parallel.

Answered By: goldilocks

It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.

Make a bash function running a single run and run that function in parallel:

#!/bin/bash

myfunc() {
    run=$1
    kar='KAR5'
    mkdir normFunc
    fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12 
    fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
    fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}

export -f myfunc
parallel myfunc ::: run2 run3 run4

To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.

Answered By: Ole Tange
for stuff in things
do
sem -j+0 "something; 
  with; 
  stuff"
done
sem --wait

This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).

sem –wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.

Note: you will need “parallel” from the GNU parallel project (sudo apt-get install parallel).

Answered By: lev

Sample task

task(){
   sleep 0.5; echo "$1";
}

Sequential runs

for thing in a b c d e f g; do 
   task "$thing"
done

Parallel runs

for thing in a b c d e f g; do 
  task "$thing" &
done

Parallel runs in N-process batches

N=4
(
for thing in a b c d e f g; do 
   ((i=i%N)); ((i++==0)) && wait
   task "$thing" & 
done
)

It’s also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.

N processes with a FIFO-based semaphore:

# initialize a semaphore with a given number of tokens
open_sem(){
    mkfifo pipe-$$
    exec 3<>pipe-$$
    rm pipe-$$
    local i=$1
    for((;i>0;i--)); do
        printf %s 000 >&3
    done
}

# run the given command asynchronously and pop/push tokens
run_with_lock(){
    local x
    # this read waits until there is something to read
    read -u 3 -n 3 x && ((0==x)) || exit $x
    (
     ( "$@"; )
    # push the return code of the command to the semaphore
    printf '%.3d' $? >&3
    )&
}

N=4
open_sem $N
for thing in {a..g}; do
    run_with_lock task $thing
done 

Explanation:

We use file descriptor 3 as a semaphore by pushing (=printf) and poping (=read) tokens ('000'). By pushing the return code of the executed tasks, we can abort if something went wrong.

Answered By: PSkocik

One really easy way that I often use:

cat "args" | xargs -P $NUM_PARALLEL command

This will run the command, passing in each line of the “args” file, in parallel, running at most $NUM_PARALLEL at the same time.

You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.

Answered By: eyeApps LLC

I had trouble with @PSkocik‘s solution. My system does not have GNU Parallel available as a package and sem threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.

@eyeApps suggested xargs but I didn’t know how to make it work with my complex use case (examples would be welcome).

Here is my solution for parallel jobs which process up to N jobs at a time as configured by _jobs_set_max_parallel:

_lib_jobs.sh:

function _jobs_get_count_e {
   jobs -r | wc -l | tr -d " "
}

function _jobs_set_max_parallel {
   g_jobs_max_jobs=$1
}

function _jobs_get_max_parallel_e {
   [[ $g_jobs_max_jobs ]] && {
      echo $g_jobs_max_jobs

      echo 0
   }

   echo 1
}

function _jobs_is_parallel_available_r() {
   (( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
      return 0

   return 1
}

function _jobs_wait_parallel() {
   # Sleep between available jobs
   while true; do
      _jobs_is_parallel_available_r &&
         break

      sleep 0.1s
   done
}

function _jobs_wait() {
   wait
}

Example usage:

#!/bin/bash

source "_lib_jobs.sh"

_jobs_set_max_parallel 3

# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
   _jobs_wait_parallel

   # Sleep between 1-2 seconds to simulate busy work
   sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)

   ( ### ASYNC
   echo $a
   sleep ${sleep_delay}s
   ) &
done

# Visualize jobs
while true; do
   n_jobs=$(_jobs_get_count_e)

   [[ $n_jobs = 0 ]] &&
      break

   sleep 0.1s
done
Answered By: Zhro

Parallel execution in max N-process concurrent

Just a vanilla bash script – no external libs/apps needed.

#!/bin/bash

N=4

for i in {a..z}; do
    (
        # .. do your stuff here
        echo "starting task $i.."
        sleep $(( (RANDOM % 3) + 1))
    ) &

    # allow to execute up to $N jobs in parallel
    if [[ $(jobs -r -p | wc -l) -ge $N ]]; then
        # now there are $N jobs already running, so wait here for any job
        # to be finished so there is a place to start next one.
        wait -n
    fi

done

# no more jobs to be started but wait for pending jobs
# (all need to be finished)
wait

echo "all done"

Another example of processing a list of files in parallel:

#!/bin/bash

N=4

find ./my_pictures/ -name "*.jpg" | (
    while read filepath; do
        jpegoptim "${filepath}" &
        if [[ $(jobs -r -p | wc -l) -ge $N ]]; then wait -n; fi
    done;
    wait
)
Answered By: Tomasz Hławiczka

In my case, I can’t use semaphore (I’m in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.

It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won’t help.

Splitting the job among N workers (1 per core)

# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths

# replace with your task
task() { # $1 = idWorker, $2 = asset
  echo "Worker $1: Asset '$2' START!"
  # simulating a task that randomly takes 3-6 seconds
  sleep $(( ($RANDOM % 4) + 3 ))
  echo "    Worker $1: Asset '$2' OK!"
}

nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core

worker() { # $1 = idWorker
  echo "Worker $1 GO!"
  idAsset=0
  for asset in "${listAssets[@]}"; do
    # split assets among workers (using modulo); each worker will go through
    # the list and select the asset only if it belongs to that worker
    (( idAsset % nWorkers == $1 )) && task $1 "$asset"
    (( idAsset++ ))
  done
  echo "    Worker $1 ALL DONE!"
}

for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
  # start workers in parallel, use 1 process for each
  worker $idWorker &
done
wait # until all workers are done
Answered By: geekley

I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.

for stuff in things
do
sem -j +0 "something; 
  with; 
  stuff"
done
sem --wait

Does the job.

-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.

-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also –use-cpus-instead-of-cores.

Answered By: moritzschaefer
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.