How to find missing files between specific date range?

I want to navigate through all the files in a folder and find out the missing file for a specific date.

Files are partitioned by hour and file name have yyyy-mm-dd-hh formatting.

So between 2017-07-01 and 2017-07-02 there will be 24 files from 2017-07-01-00 through 2017-07-01-23

How can I find missing hourly file if I pass above dates as start and end date?

Appreciate any input!

Asked By: Yu Ni

||

On a GNU system:

#! /bin/bash -
ret=0
start=${1?} end=${2?}
t1=$(date -d "$start" +%s) t2=$(date -d "$end" +%s)

for ((t = t1; t < t2; t += 60*60)); do
  printf -v file '%(%F-%H)T' "$t"
  if [ ! -e "$file" ]; then
    printf >&2 '"%s" not foundn' "$file"
    ret=1
  fi
done
exit "$ret"

Note that on the day of the switch to winter time (in timezones that implement daylight saving), you may get an error message twice if a file is missing for the hour of the switch. Fix $TZ to UTC0 if you want 24 hours per day for every day (for instance, if whatever creates those files uses UTC time instead of local time).

Answered By: Stéphane Chazelas
# presuming that the files are e. g. template-2017-07-01-16:

# To test a given date
for file in template-2017-07-01-{00..23}; do
  if ! [[ -f "$file" ]]; then
    echo "$file is missing"
  fi
done

# To test a given year
year=2017
for month in seq -w 1 12; do
    dim=$( cal $( date -d "$year-$month-01" "+%m %Y" | awk 'NF { days=$NF} END {print days}' )
    for day in $(seq -w 1 $dim); do
        for file in template-${year}-${month}-${day}-{00..23}; do
           if ! [[ -f "$file" ]]; then
             echo "$file is missing"
           fi
        done
    done
done
Answered By: DopeGhoti

What about command like below:

 grep -Fvf <(find * -type f ( -name "2017-07-02-00" $(printf " -o -name %s" 2017-07-02-{01..23}) )) 
           <(printf "%sn" 2017-07-02-{00..23})
ls
2017-07-02-01  2017-07-02-06  2017-07-02-08  2017-07-02-14  2017-07-02-19
2017-07-02-04  2017-07-02-07  2017-07-02-11  2017-07-02-15  2017-07-02-22

The output after command ran:

2017-07-02-00
2017-07-02-02
2017-07-02-03
2017-07-02-05
2017-07-02-09
2017-07-02-10
2017-07-02-12
2017-07-02-13
2017-07-02-16
2017-07-02-17
2017-07-02-18
2017-07-02-20
2017-07-02-21
2017-07-02-23

Above we are generating all possibilities of 24 files using printf and pass it to find its -name parameter which printf also helping her, then with grep command we are printing those files are exist in our pattern but find didn’t find them.

Answered By: αғsнιη

Why not use egrep? you can then regex it the way you want.

 egrep (2017-07-0[1-2]-dd$) *file name here*| tail     

regex might be a little off – sorry.

Answered By: baron dune

Usage: ./diff_date.sh 2017-08-30-00 2017-09-02-00

#!/bin/bash

# This processing is needed, because `date` require 2017-08-30 00 format,
# not 2017-08-30-00. So, last dash is replacing by space in here.
start=$(sed 's/-/ /3' <<< "$1")
end=$(sed 's/-/ /3' <<< "$2")

while [[ "$start" != "$end" ]]; do
    # Returns dash back to its place and checks - does this file exist. 
    if [ ! -f "${start/ /-}" ]; then 
        echo "${start/ /-}"
    fi  
    # Performance of this code can be improved, by calling `date` only when
    # day is changing, not the every hour.
    start=$(date -d "${start} + 1 hour" "+%F %H")
done

Testing:

# make files
$ touch 2017-08-{30..31}-{03..23}; touch 2017-09-{01..02}-{03..23}
$
$ ./diff_date.sh 2017-08-30-00 2017-09-02-00
##### Output - missing files. #####
2017-08-30-00
2017-08-30-01
2017-08-30-02
2017-08-31-00
2017-08-31-01
2017-08-31-02
2017-09-01-00
2017-09-01-01
2017-09-01-02
Answered By: MiniMax
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.