How to find missing files between specific date range?
I want to navigate through all the files in a folder and find out the missing file for a specific date.
Files are partitioned by hour and file name have yyyy-mm-dd-hh
formatting.
So between 2017-07-01
and 2017-07-02
there will be 24 files from 2017-07-01-00
through 2017-07-01-23
How can I find missing hourly file if I pass above dates as start and end date?
Appreciate any input!
On a GNU system:
#! /bin/bash -
ret=0
start=${1?} end=${2?}
t1=$(date -d "$start" +%s) t2=$(date -d "$end" +%s)
for ((t = t1; t < t2; t += 60*60)); do
printf -v file '%(%F-%H)T' "$t"
if [ ! -e "$file" ]; then
printf >&2 '"%s" not foundn' "$file"
ret=1
fi
done
exit "$ret"
Note that on the day of the switch to winter time (in timezones that implement daylight saving), you may get an error message twice if a file is missing for the hour of the switch. Fix $TZ
to UTC0 if you want 24 hours per day for every day (for instance, if whatever creates those files uses UTC time instead of local time).
# presuming that the files are e. g. template-2017-07-01-16:
# To test a given date
for file in template-2017-07-01-{00..23}; do
if ! [[ -f "$file" ]]; then
echo "$file is missing"
fi
done
# To test a given year
year=2017
for month in seq -w 1 12; do
dim=$( cal $( date -d "$year-$month-01" "+%m %Y" | awk 'NF { days=$NF} END {print days}' )
for day in $(seq -w 1 $dim); do
for file in template-${year}-${month}-${day}-{00..23}; do
if ! [[ -f "$file" ]]; then
echo "$file is missing"
fi
done
done
done
What about command like below:
grep -Fvf <(find * -type f ( -name "2017-07-02-00" $(printf " -o -name %s" 2017-07-02-{01..23}) ))
<(printf "%sn" 2017-07-02-{00..23})
ls
2017-07-02-01 2017-07-02-06 2017-07-02-08 2017-07-02-14 2017-07-02-19
2017-07-02-04 2017-07-02-07 2017-07-02-11 2017-07-02-15 2017-07-02-22
The output after command ran:
2017-07-02-00
2017-07-02-02
2017-07-02-03
2017-07-02-05
2017-07-02-09
2017-07-02-10
2017-07-02-12
2017-07-02-13
2017-07-02-16
2017-07-02-17
2017-07-02-18
2017-07-02-20
2017-07-02-21
2017-07-02-23
Above we are generating all possibilities of 24 files using printf
and pass it to find
its -name
parameter which printf
also helping her, then with grep
command we are printing those files are exist in our pattern but find
didn’t find them.
Why not use egrep? you can then regex it the way you want.
egrep (2017-07-0[1-2]-dd$) *file name here*| tail
regex might be a little off – sorry.
Usage: ./diff_date.sh 2017-08-30-00 2017-09-02-00
#!/bin/bash
# This processing is needed, because `date` require 2017-08-30 00 format,
# not 2017-08-30-00. So, last dash is replacing by space in here.
start=$(sed 's/-/ /3' <<< "$1")
end=$(sed 's/-/ /3' <<< "$2")
while [[ "$start" != "$end" ]]; do
# Returns dash back to its place and checks - does this file exist.
if [ ! -f "${start/ /-}" ]; then
echo "${start/ /-}"
fi
# Performance of this code can be improved, by calling `date` only when
# day is changing, not the every hour.
start=$(date -d "${start} + 1 hour" "+%F %H")
done
Testing:
# make files
$ touch 2017-08-{30..31}-{03..23}; touch 2017-09-{01..02}-{03..23}
$
$ ./diff_date.sh 2017-08-30-00 2017-09-02-00
##### Output - missing files. #####
2017-08-30-00
2017-08-30-01
2017-08-30-02
2017-08-31-00
2017-08-31-01
2017-08-31-02
2017-09-01-00
2017-09-01-01
2017-09-01-02