2012-12-19

Adding async in Bash

Today I had to wait for a pretty long time to get photos of hand-written pages converted to djvu/pdf with my small script. And the first idea was to get it make use of my 4 cores. First think was about using existing job control, but unfortunately it is too weak. So here is changes I had to do to get it to work:
active_jobs=()
max_jobs=4

refresh_jobs() {
    local pid
    local actual_jobs=()
    for pid in "${active_jobs[@]}"; do
        kill -0 "$pid" 2> /dev/null \
          && actual_jobs+=("$pid") \
          || wait "$pid" # avoid zombie
    done
    active_jobs=("${actual_jobs[@]}")
}  

trap refresh_jobs SIGCHLD

spawn() {
    [ ${#active_jobs[@]} -ge $max_jobs ] && refresh_jobs
    while [ ${#active_jobs[@]} -ge $max_jobs ]; do
        msg "Waiting for jobs..."
        sleep 5
        refresh_jobs
    done
    "$@" &
    active_jobs+=($!)
}      
   
barrier() {
    wait "${active_jobs[@]}"
    active_jobs=()
}  

# ...

msg "Building DjVuDocuments"
for page in "$tmpdir"/page-*.p[pgb]m "$tmpdir"/page-*.jpg; do
    case "$page" in
    *-\*.*) continue ;;
    esac
    djvu="${page%.*}.djvu"
    msg "Encode $page"
    # cpaldjvu -colors $color -bgwhite "$page" "$djvu" || die "Failed to encode $page"
    spawn cpaldjvu -colors $color -bgwhite "$page" "$djvu"
done
# now wait for all spawns
barrier