Script to select which result/value to be copied to a text file using AWK

Can someone help me with the last step in my bash script.
You helped me already getting this far.

#!/bin/bash

find . -type f 
       -name '*.mp4' -o -name '*.mkv' 
    -o -name '*.avi' -o -name '*.mov' |
while read -r file
do 
    size=$(stat -c %s "$file")
    duration=$(ffprobe -v error -show_entries format=duration -of default=noprint_wrappers=1:nokey=1 "$file")
    codec=$(ffprobe -v error -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 "$file")
    ratio=$(bc <<< "scale=2; $size / $duration")
    echo "$file: codec=$codec, size=$size, duration=$duration, ratio=$ratio" | tee -a /home/user/Downloads/logfile
    printf $ratio | awk '{print $1/1000}'| tee -a /home/user/Downloads/logfile
done

Now ALL results go to the text file.
But is there a way to select only the files with a ratio bigger than …?

As requested by @markp-fuso, I clarified a few points:

please update the question with some samples of what’s in $ratio

In the $ratio are numbers generated from dividing the video file size in bytes with the duration of the videofile in seconds. These are numbers around 50k to 1000k. Therefore I use awk ‘{print $1/1000}’to bring them more in the range of 50 – 1000.

and what you’re thinking of using as a cutoff/threshold; are you
looking to filter based on a) the value in $ratio, b) the value
generated by awk or c) the result of the numfmt call?

Good points, I had in mind to use the numfmt call to bring the bytes up to megabytes. But that is redunded. So that can be deleted and I hereby did.
It works up to the point that it generates the wanted outcome:

./file1.mp4: codec=h264
aac, size=54886926, duration=94.900000, ratio=578365.92
578.366
./file2.mp4: codec=vp9
aac, size=15147100, duration=108.159000, ratio=140044.74
140.045
./file3.mp4: codec=vp9
aac, size=22306731, duration=109.947000, ratio=202886.21
202.886

I will use this to find the video/audio files can be encoded/shrink-ed as they are large for their duration time.
So a high $ratio can possible be encoded.The value can easily be adjusted in the script, but it will be around 200-400.
Depending on the codec efficiency (I just added a line to display the used codec also).

So in the end I like to have a text file with only those files that do make the requirements, in this case a bigger than set ratio.I will than make a decision based on experience.

Note: If possible, it would be great if files that can not be read (eg. due to being corrupted), so without any value, will be added to the text file.

Let’s assume I set the ratio to 200, than the txt file should contain the following, based on the 3 examples above:

./file1.mp4: codec=h264
aac, size=54886926, duration=94.900000, ratio=578365.92
578.366
./file3.mp4: codec=vp9
aac, size=22306731, duration=109.947000, ratio=202886.21
202.886

Any help will be highly appreciated.

Cheers

Asked By: dave999

||

Likely near the top, declare your cutoff value:

# We only care about files with ratios GREATER than this:
cutoff=200000

Then near the bottom of your while loop, wrap the echo and printf commands with a test and an if statement:

    ...
    ratio=$(bc <<< "scale=2; $size / $duration")
    rc=$(bc <<< "$ratio > $cutoff")
    if [[ "$rc" == "1" ]]
    then {
        echo "$file: codec=$codec, size=$size, duration=$duration, ratio=$ratio"
        awk '{print $1/1000}' <<< "$ratio"
    } | tee -a /home/user/Downloads/logfile
    fi
done
Answered By: Jim L.

There are various issues in your script that you should fix before addressing the point you’re asking about. First, your find command is wrong and second, this will fail on file names containing newline characters.

Your find is wrong because you aren’t grouping the options. This means that your command will also find directories whose name ends in e.g. .mov. Consider this directory:

$ ls -lF
total 4
-rw-r--r-- 1 terdon terdon    0 Mar 18 18:37 'a bad'$'n''file name.mp4'
drwxr-xr-x 2 terdon terdon 4096 Mar 18 18:38  foo.mov/

That contains one file, whose name has spaces and a newline, and one directory, foo.mov. You only want to process the files, but your find will also return the directory:

$ find . -type f -name '*.mp4' -o -name '*.mkv' -o -name '*.avi' -o -name '*.mov' 
./foo.mov
./a bad?file name.mp4

You want the -type f to apply to all conditions, and to do that, you need to group them as an answer to your previous question mentioned:

$ find . -type f ( -name '*.mp4' -o -name '*.mkv' -o -name '*.avi' -o -name '*.mov' )
./a bad?file name.mp4

As you can see above, grouping them with parentheses (they need to be escaped as ( or quoted '(' to protect them from the shell) makes the command find only files, as desired. The next issue is the newlines. You can address this by telling find to print its results separated with a NULL () byte instead of a newline. GNU find, the default on Linux systems, can do this with -print0, and with other find implementations, you can use -printf.

Here is the error if you don’t handle such names:

$ find . -type f ( -name '*.mp4' -o -name '*.mkv' -o -name '*.avi' -o -name '*.mov' ) | while read -r file; do ls -l "$file"; done
ls: cannot access './a bad': No such file or directory
ls: cannot access 'file name.mp4': No such file or directory

And here’s how to get it right:

$ find . -type f ( -name '*.mp4' -o -name '*.mkv' -o -name '*.avi' -o -name '*.mov' ) -print0 | while read -r -d '' file; do ls -l "$file"; done
-rw-r--r-- 1 terdon terdon 0 Mar 18 18:37 './a bad'$'n''file name.mp4'

The IFS= isn’t essential here, but it is good practice to use it. See this SO answer for an example of why, the real job is done by the -d '' option to read which tells read to use NULL as the input deliminator.

Finally, you also need to be able to handle more than one codec returned since that seems quite common, at least with the files I tested with. For example:

$ ffprobe -v error -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 foo.mkv 
hevc
ac3
ass

So pass the output of the ffprobe command through tr 'n' ',' or something to remove the newlines:

$ ffprobe -v error -show_entries stream=codec_name -of default=noprint_wrappers=1:nokey=1 foo.mkv | tr 'n' ','
hevc,ac3,ass,$

(That final $ is my prompt, shown there to indicate there is no trailing newline here.)

Now, all that said, since you have the ratio in a variable already, all you need is a simple if to check if it is above your threshold. I don’t understand why you have two ratios ($ratio and $ratio / 1000), it seems more reasonable to me to only use the one you’re actually testing, but that’s your call. Here’s a working script:

#!/bin/bash

threshold=$1
if [ -z "$threshold" ]; then
  echo "No threshold given, using the default value of 200" >&2
  threshold=200
fi

logfile="/home/user/Downloads/logfile"

find . -type f 
  ( -name '*.mp4' -o -name '*.mkv' -o 
     -name '*.avi' -o -name '*.mov' ) -print0 | 
 while IFS= read -r -d '' file
 do 
    size=$(stat -c %s "$file")
    duration=$(ffprobe -v error -show_entries format=duration 
                       -of default=noprint_wrappers=1:nokey=1 "$file")
    codec=$(ffprobe -v error -show_entries stream=codec_name 
                    -of default=noprint_wrappers=1:nokey=1 "$file" |
              tr 'n' ',')
    ratio=$(bc <<< "scale=2; $size / $duration")

    # Check that a ratio was found, otherwise print an error
    if [[ -z "$ratio" ]]; then
      echo "No ratio found for '$file'" >&2
    else
      ## Not sure why you want two separate values for ratio but...
      ratio2=$(bc <<< "$ratio / 1000")

      if [[ $ratio2 -ge $threshold ]]; then
        printf '%s: codec=%s size=%s, duration=%s, ratio=%sn' 
               "$file" "$codec" "$size" "$duration" "$ratio" | tee -a "$logfile"
        echo "$ratio2" | tee -a "$logfile"
      fi
    fi
done

You would now run it with the threshold as an argument (or with no argument to default to 200):

script.sh 300

I also made some other minor, mostly aesthetic, changes to the script, as well as adding some basic error handling, but it should do exactly the same thing. The output looks like:

$ foo.sh 200
./file3.mkv: codec=h264,aac, size=764948534, duration=3488.131000, ratio=219300.40
219
./file7.mkv: codec=h264,aac, size=739550128, duration=3542.852000, ratio=208744.29
208
./file5.mkv: codec=h264,aac, size=688337512, duration=3439.637000, ratio=200119.23
200
./file1.mkv: codec=h264,aac, size=883534591, duration=3701.386000, ratio=238703.71
238
./file4.mkv: codec=h264,aac, size=828112726, duration=3769.898000, ratio=219664.49
219
Answered By: terdon
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.